Skip to contents

The M3 competition results are publicly available in an IJF paper published in 2000. The best performing methods overall were the Theta method, ForecastPro and ForecastX, as measured by the symmetric MAPE (sMAPE) that was favoured by Makridakis and Hibon. The following table shows some of the results from the original competition including results from the main commercial software vendors. The first sMAPE column is taken from the original paper. My own recalculation of the sMAPE results usually gives values slightly less than those published (I don’t know why). The MAPE column shows the mean absolute percentage error and the MASE column shows the mean absolute scaled errors.

# Original M3 forecasts
nseries <- length(M3)
theta <- as.matrix(M3Forecast$THETA)
fpro <- as.matrix(M3Forecast$ForecastPro)
fcx <- as.matrix(M3Forecast$ForcX)
bjauto <- as.matrix(M3Forecast$`B-J auto`)
ab1 <- as.matrix(M3Forecast$AutoBox1)
ab2 <- as.matrix(M3Forecast$AutoBox2)
ab3 <- as.matrix(M3Forecast$AutoBox3)
ets1 <- aarima <- hybrid <- matrix(NA, nrow=nseries, ncol=18)
# ETS and ARIMA forecasts, and combination
for (i in seq(nseries)) {
  ets1[i, ] <- forecast(ets(M3[[i]]$x), h=18, PI=FALSE)$mean
  aarima[i, ] <- forecast(auto.arima(M3[[i]]$x), h=18)$mean
  hybrid[i, ] <- 0.5 * (aarima[i, ] + ets1[i, ])
# Compute accuracy of all methods
mase <- mape <- smape <- matrix(NA, nrow=10, ncol=nseries)
f <- matrix(NA, nrow=10, ncol=18)
for (i in seq(nseries)) {
  x <- M3[[i]]$xx
  n <- length(x)
  f[1, seq(n)] <- theta[i, seq(n)]
  f[2, seq(n)] <- fpro[i, seq(n)]
  f[3, seq(n)] <- fcx[i, seq(n)]
  f[4, seq(n)] <- bjauto[i, seq(n)]
  f[5, seq(n)] <- ab1[i, seq(n)]
  f[6, seq(n)] <- ab2[i, seq(n)]
  f[7, seq(n)] <- ab3[i, seq(n)]
  f[8, seq(n)] <- ets1[i, seq(n)]
  f[9, seq(n)] <- aarima[i, seq(n)]
  f[10, seq(n)] <- hybrid[i, seq(n)]
  scale <- mean(abs(diff(M3[[i]]$x, lag = frequency(x))))
  for (j in seq(10)) {
    e <- abs((x - f[j, seq(n)]))
    mape[j, i] <- mean(e / x) * 100
    smape[j, i] <- mean(e / (abs(x) + abs(f[j, seq(n)]))) * 200
    mase[j, i] <- mean(e / scale)
m3table <- matrix(NA, nrow = 10, ncol = 3)
m3table[, 1] <- rowMeans(mape, na.rm = TRUE)
m3table[, 2] <- rowMeans(smape)
m3table[, 3] <- rowMeans(mase)
rownames(m3table) <- c("Theta", "ForecastPro", "ForecastX", "BJauto",
            "Autobox1", "Autobox2", "Autobox3", "ETS", "AutoARIMA", "Hybrid")
colnames(m3table) <- c("MAPE", "Average_sMAPE_recalculated", "MASE")
j <- order(m3table[, 3])
m3table <- cbind(m3table,
  Average_sMAPE = c(13.01, 13.19, 13.49, 14.01, 15.23, 14.41, 15.33, NA, NA, NA))
  c("Theta", "ForecastPro", "ForecastX", "BJauto", "Autobox2", "Autobox1", "Autobox3"),
  c("Average_sMAPE", "Average_sMAPE_recalculated", "MAPE", "MASE")], 2))
Average_sMAPE Average_sMAPE_recalculated MAPE MASE
Theta 13.01 12.76 17.42 1.39
ForecastPro 13.19 13.06 18.00 1.47
ForecastX 13.49 13.09 17.35 1.42
BJauto 14.01 13.72 19.13 1.54
Autobox2 14.41 13.82 18.23 1.51
Autobox1 15.23 15.20 20.36 1.69
Autobox3 15.33 15.46 19.31 1.57

BJ automatic was produced by ForecastPro but with the forecasts restricted to ARIMA models. For some reason, Autobox was allowed three separate submissions (a practice normally not allowed as it leads to over-fitting on the test set).

Any good forecasting software should be aiming to get close to (or better than) Theta on this test. After all, the M3 competition was held nearly 20 years ago. Presumably all of the software companies have tried to improve their results since then. Unfortunately, none of them to my knowledge has published any updated figures. I wish they would, preferably independently verified. It would provide some evidence that they are improving their algorithms.

My aim with the forecast package for R is to make freely available state-of-the-art algorithms for some forecasting models. I do not attempt to offer a comprehensive suite of algorithms, but what I do provide gives forecasts that are in the same ballpark as the best methods in the M3 competition. Here is the evidence.

kable(round(m3table[c("ETS", "AutoARIMA", "Hybrid"),
                    c("Average_sMAPE_recalculated", "MAPE", "MASE")], 2))
Average_sMAPE_recalculated MAPE MASE
ETS 13.07 17.69 1.43
AutoARIMA 13.57 18.72 1.45
Hybrid 12.77 17.68 1.40

The last method is a simple average of the forecasts from ets() and auto.arima(). If you only want point forecasts for annual, monthly or quarterly data, that is the best approach currently available in the forecast package. It is also better than any of the commercial software (at least as far as they have been prepared to subject their algorithms to independent testing).

Unlike the commercial vendors, you don’t have to take my word for it. My algorithms are open source.