CVar computes the errors obtained by applying an autoregressive modelling function to subsets of the time series y using k-fold cross-validation as described in Bergmeir, Hyndman and Koo (2015). It also applies a Ljung-Box test to the residuals. If this test is significant (see returned pvalue), there is serial correlation in the residuals and the model can be considered to be underfitting the data. In this case, the cross-validated errors can underestimate the generalization error and should not be used.

CVar(y, k = 10, FUN = nnetar, cvtrace = FALSE, blocked = FALSE,
  LBlags = 24, ...)

Arguments

y

Univariate time series

k

Number of folds to use for cross-validation.

FUN

Function to fit an autoregressive model. Currently, it only works with the nnetar function.

cvtrace

Provide progress information.

blocked

choose folds randomly or as blocks?

LBlags

lags for the Ljung-Box test, defaults to 24, for yearly series can be set to 20

...

Other arguments are passed to FUN.

Value

A list containing information about the model and accuracy for each fold, plus other summary information computed across folds.

References

Bergmeir, C., Hyndman, R.J., Koo, B. (2018) A note on the validity of cross-validation for evaluating time series prediction. Computational Statistics & Data Analysis, 120, 70-83. https://robjhyndman.com/publications/cv-time-series/.

See also

Examples

modelcv <- CVar(lynx, k=5, lambda=0.15) print(modelcv)
#> Series: lynx #> Call: CVar(y = lynx, k = 5, lambda = 0.15) #> #> 5-fold cross-validation #> Mean SD #> ME -47.94024811 168.45264852 #> RMSE 941.32651587 312.32265027 #> MAE 590.53751580 186.42613185 #> MPE -18.38857601 11.25029762 #> MAPE 53.09867080 13.27799759 #> ACF1 0.03351767 0.18600230 #> Theil's U 0.74429498 0.09406432 #> #> p-value of Ljung-Box test of residuals is 0.7000251 #> if this value is significant (<0.05), #> the result of the cross-validation should not be used #> as the model is underfitting the data.
print(modelcv$fold1)
#> $model #> Series: y #> Model: NNAR(11,6) #> Call: FUN(y = y, lambda = 0.15, subset = trainset) #> #> Average of 20 networks, each of which is #> a 11-6-1 network with 79 weights #> options were - linear output units #> #> sigma^2 estimated as 0.09849 #> #> $accuracy #> ME RMSE MAE MPE MAPE ACF1 Theil's U #> Test set 54.5283 669.1654 393.0634 -28.47463 53.24511 0.1040563 0.6553205 #> #> $testfit #> Time Series: #> Start = 1821 #> End = 1934 #> Frequency = 1 #> [1] NA NA NA NA NA NA #> [7] NA NA NA NA NA 105.21504 #> [13] 173.53080 286.81383 427.76973 2237.13163 2757.61403 3238.53147 #> [19] 3444.50705 513.36004 156.14849 46.60305 69.77331 205.89523 #> [25] 534.67536 1045.54882 2166.50062 2809.28424 964.44677 352.69546 #> [31] 355.51756 242.29041 348.68280 745.43335 1287.35967 2906.52296 #> [37] 3047.12261 1845.34837 715.91621 194.66933 257.27823 217.94619 #> [43] 445.70225 1513.70832 3199.27674 6176.38809 4333.43277 719.76326 #> [49] 176.42534 418.40068 360.08275 487.32443 1591.68267 1767.98777 #> [55] 2327.85152 1477.34294 739.16119 294.98107 203.57957 226.52017 #> [61] 180.21405 941.25544 1943.38268 3014.77741 4223.73641 4575.36955 #> [67] 419.77833 180.02903 39.17359 52.26742 63.24813 184.93638 #> [73] 405.95237 614.39117 3742.22197 3384.20758 587.25797 104.19192 #> [79] 158.04398 141.19296 751.69124 1316.49408 3356.31448 6748.68018 #> [85] 6171.41779 3721.28521 1737.52499 365.17075 376.97026 815.86490 #> [91] 1354.86611 2745.67193 3713.46794 3042.56180 3087.87837 3692.32603 #> [97] 689.80358 81.77819 120.49390 107.47938 220.57929 431.06549 #> [103] 1133.61505 2435.64993 3405.33786 3039.67200 743.28852 520.01698 #> [109] 452.49698 663.33892 1028.01775 1588.04980 2667.86157 3455.24028 #> #> $testset #> [1] 3 5 8 10 19 20 35 38 39 40 43 49 52 55 61 62 66 68 74 #> [20] 80 99 107 110 #>
library(ggplot2) autoplot(lynx, series="Data") + autolayer(modelcv$testfit, series="Fits") + autolayer(modelcv$residuals, series="Residuals")
#> Warning: Removed 11 rows containing missing values (geom_path).
#> Warning: Removed 11 rows containing missing values (geom_path).
ggAcf(modelcv$residuals)