Surprisals

Compute surprisals or surprisal probabilities from a model or a data set. A surprisal is given by \(s = -\log f(y)\) where \(f\) is the density or probability mass function of the estimated or assumed distribution, and \(y\) is an observation. A surprisal probability is the probability of a surprisal at least as extreme as \(s\).

The surprisal probabilities may be computed in three different ways.

Given the same distribution that was used to compute the surprisal values. Under this option, surprisal probabilities are equal to 1 minus the coverage probability of the largest HDR that contains each value. Surprisal probabilities smaller than 1e-6 are returned as 1e-6.
Using a Generalized Pareto Distribution fitted to the most extreme surprisal values (those with probability less than threshold_probability). This option is used if approximation = "gpd". For surprisal probabilities greater than threshold_probability, the value of threshold_probability is returned. Under this option, the distribution is used for computing the surprisal values but not for determining their probabilities. Due to extreme value theory, the resulting probabilities should be relatively insensitive to the distribution used in computing the surprisal values.
Empirically as the proportion of observations with greater surprisal values. This option is used when approxiation = "empirical". This is also insensitive to the distribution used in computing the surprisal values.

Usage

surprisals(
  object,
  probability = TRUE,
  approximation = c("none", "gpd", "empirical"),
  threshold_probability = 0.1,
  ...
)

# Default S3 method
surprisals(
  object,
  probability = TRUE,
  approximation = c("none", "gpd", "empirical"),
  threshold_probability = 0.1,
  distribution = dist_kde(object, multiplier = 2, ...),
  loo = FALSE,
  ...
)

Arguments

object: A model or numerical data set
probability: Should surprisal probabilities be computed, or the surprisal values?
approximation: Character string specifying the approximation to use in computing the surprisal probabilities. Ignored if probability = FALSE. : none specifies that no approximation is to be used; gpd specifies that the Generalized Pareto distribution should be used; while empirical specifies that the probabilities should be estimated empirically.
threshold_probability: Probability threshold when computing the GPD approximation. This is the probability below which the GPD is fitted. Only used if approximation = "gpd").
...: Other arguments are passed to the appropriate method.
distribution: A distribution object. If not provided, a kernel density estimate is computed from the data object.
loo: Should leave-one-out surprisals be computed?

Value

A numerical vector containing the surprisals or surprisal probabilities.

Details

If no distribution is provided, a kernel density estimate is computed. The leave-one-out surprisals (or LOO surprisals) are obtained by estimating the kernel density estimate using all other observations.

Author

Rob J Hyndman

Examples