The lookout algorithm (Kandanaarachchi & Hyndman, 2022) computes
leave-one-out surprisal probabilities from a kernel density estimate using a
Generalized Pareto distribution. The kernel density estimate uses a
bandwidth based on topological data analysis and a quadratic kernel. So it is
similar but not identical to using surprisals
with loo = TRUE
and approximation = "gdp"
. A low probability indicates a likely anomaly.
Arguments
- object
A numerical data set.
- ...
Other arguments are passed to
lookout
.
References
Sevvandi Kandanaarachchi & Rob J Hyndman (2022) "Leave-one-out kernel density estimates for outlier detection", J Computational & Graphical Statistics, 31(2), 586-599. https://robjhyndman.com/publications/lookout/
Examples
# Univariate data
tibble(
y = c(5, rnorm(49)),
lookout = lookout_prob(y)
)
#> # A tibble: 50 × 2
#> y lookout
#> <dbl> <dbl>
#> 1 5 0
#> 2 0.550 1
#> 3 -0.697 1
#> 4 0.391 1
#> 5 0.381 1
#> 6 -0.0124 1
#> 7 -0.124 1
#> 8 1.47 1
#> 9 0.674 1
#> 10 1.96 0.192
#> # ℹ 40 more rows
# Bivariate data
tibble(
x = rnorm(50),
y = c(5, rnorm(49)),
lookout = lookout_prob(cbind(x, y))
)
#> # A tibble: 50 × 3
#> x y lookout
#> <dbl> <dbl> <dbl>
#> 1 -0.186 5 0
#> 2 1.40 -0.547 1
#> 3 0.0185 -1.69 1
#> 4 0.249 -1.57 1
#> 5 0.149 -0.405 1
#> 6 -0.963 0.319 1
#> 7 -0.0665 0.0404 1
#> 8 1.29 -0.390 1
#> 9 0.458 -1.82 1
#> 10 -1.45 0.659 1
#> # ℹ 40 more rows
# Using a regression model
of <- oldfaithful |> filter(duration < 7200, waiting < 7200)
fit_of <- lm(waiting ~ duration, data = of)
broom::augment(fit_of) |>
mutate(lookout = lookout_prob(.std.resid)) |>
arrange(lookout)
#> # A tibble: 2,197 × 9
#> waiting duration .fitted .resid .hat .sigma .cooksd .std.resid lookout
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 5700 1 2837. 2863. 0.0138 424. 0.316 6.73 0
#> 2 6060 120 4274. 1786. 0.00348 427. 0.0304 4.17 0.0194
#> 3 6971 210 5360. 1611. 0.000541 427. 0.00383 3.76 0.0265
#> 4 7080 220 5481. 1599. 0.000473 427. 0.00329 3.73 0.0271
#> 5 3600 170 4877. -1277. 0.00133 428. 0.00593 -2.98 0.0285
#> 6 4500 241 5735. -1235. 0.000497 428. 0.00206 -2.88 0.0340
#> 7 6480 180 4998. 1482. 0.00106 428. 0.00633 3.46 0.0351
#> 8 6618 192 5143. 1475. 0.000795 428. 0.00471 3.44 0.0358
#> 9 6720 201 5252. 1468. 0.000647 428. 0.00380 3.43 0.0364
#> 10 3420 150 4636. -1216. 0.00204 428. 0.00823 -2.84 0.0368
#> # ℹ 2,187 more rows