Skip to contents

The lookout algorithm (Kandanaarachchi & Hyndman, 2022) computes leave-one-out surprisal probabilities from a kernel density estimate using a Generalized Pareto distribution. The kernel density estimate uses a bandwidth based on topological data analysis and a quadratic kernel. So it is similar but not identical to using surprisals with loo = TRUE and approximation = "gdp". A low probability indicates a likely anomaly.

Usage

lookout_prob(object, ...)

Arguments

object

A numerical data set.

...

Other arguments are passed to lookout.

Value

A numerical vector containing the lookout probabilities

References

Sevvandi Kandanaarachchi & Rob J Hyndman (2022) "Leave-one-out kernel density estimates for outlier detection", J Computational & Graphical Statistics, 31(2), 586-599. https://robjhyndman.com/publications/lookout/

See also

Author

Rob J Hyndman

Examples

# Univariate data
tibble(
  y = c(5, rnorm(49)),
  lookout = lookout_prob(y)
)
#> # A tibble: 50 × 2
#>          y lookout
#>      <dbl>   <dbl>
#>  1  5        0    
#>  2  0.550    1    
#>  3 -0.697    1    
#>  4  0.391    1    
#>  5  0.381    1    
#>  6 -0.0124   1    
#>  7 -0.124    1    
#>  8  1.47     1    
#>  9  0.674    1    
#> 10  1.96     0.192
#> # ℹ 40 more rows
# Bivariate data
tibble(
  x = rnorm(50),
  y = c(5, rnorm(49)),
  lookout = lookout_prob(cbind(x, y))
)
#> # A tibble: 50 × 3
#>          x       y lookout
#>      <dbl>   <dbl>   <dbl>
#>  1 -0.186   5            0
#>  2  1.40   -0.547        1
#>  3  0.0185 -1.69         1
#>  4  0.249  -1.57         1
#>  5  0.149  -0.405        1
#>  6 -0.963   0.319        1
#>  7 -0.0665  0.0404       1
#>  8  1.29   -0.390        1
#>  9  0.458  -1.82         1
#> 10 -1.45    0.659        1
#> # ℹ 40 more rows
# Using a regression model
of <- oldfaithful |> filter(duration < 7200, waiting < 7200)
fit_of <- lm(waiting ~ duration, data = of)
broom::augment(fit_of) |>
  mutate(lookout = lookout_prob(.std.resid)) |>
  arrange(lookout)
#> # A tibble: 2,197 × 9
#>    waiting duration .fitted .resid     .hat .sigma .cooksd .std.resid lookout
#>      <dbl>    <dbl>   <dbl>  <dbl>    <dbl>  <dbl>   <dbl>      <dbl>   <dbl>
#>  1    5700        1   2837.  2863. 0.0138     424. 0.316         6.73  0     
#>  2    6060      120   4274.  1786. 0.00348    427. 0.0304        4.17  0.0194
#>  3    6971      210   5360.  1611. 0.000541   427. 0.00383       3.76  0.0265
#>  4    7080      220   5481.  1599. 0.000473   427. 0.00329       3.73  0.0271
#>  5    3600      170   4877. -1277. 0.00133    428. 0.00593      -2.98  0.0285
#>  6    4500      241   5735. -1235. 0.000497   428. 0.00206      -2.88  0.0340
#>  7    6480      180   4998.  1482. 0.00106    428. 0.00633       3.46  0.0351
#>  8    6618      192   5143.  1475. 0.000795   428. 0.00471       3.44  0.0358
#>  9    6720      201   5252.  1468. 0.000647   428. 0.00380       3.43  0.0364
#> 10    3420      150   4636. -1216. 0.00204    428. 0.00823      -2.84  0.0368
#> # ℹ 2,187 more rows