RDP 2015-12: Modelling the Australian Dollar Appendix A: Out-of-sample Forecasting
October 2015 – ISSN 1448-5109 (Online)
- Download the Paper 1.39MB
In assessing the models' out-of-sample performance, actual realised values of the explanatory variables are used to construct forecasts of the RTWI at different horizons. Test statistics are then constructed by comparing these forecasts to the realised values of the RTWI.
Consistent with a large portion of the literature, the structural models are compared to a naïve forecast model, namely a random walk. This is done using the DM (Diebold and Mariano 1995; West 1996) and CW statistics (Clark and West 2006, 2007), both of which assess models based on their mean squared forecast errors (MSFE).
The CW statistic is widely used in assessing the out-of-sample performance of nested models.^{[47]} The CW statistic compares the MSFE of the two models but, unlike a number of other test statistics, it accounts for a bias in the MSFE that arises when comparing nested models. The intuition behind this adjustment is that, if the true data-generating process is a random walk, the structural model is over-fitted, which can reduce its forecast accuracy and lead to a higher MSFE. This issue would be ameliorated if the sample was sufficiently large, as the estimate of the structural model should approach the true random walk model. However, in most cases the sample will not be sufficiently large and the small-sample bias will remain (Clark and West 2006).
The CW statistic therefore compares the MSFE of the random walk model to the adjusted MFSE of the structural model.^{[48]} If the forecasts tend not to be biased, the null hypothesis is that the two models have equivalent forecast performance and the test can be considered to be a minimum MSFE test.^{[49]} A CW statistic of zero would then indicate equivalent forecast performance, while a CW statistic above zero would indicate that the structural model's forecasts are ‘better’ than those from the random walk model (Rogoff and Stavrakeva 2008). However, if the forecasts are biased, the null hypothesis is that the exchange rate is a random walk and the test can no longer be considered a test of minimum MSFE (Rogoff and Stavrakeva 2008).
Comparing the MSFE of the random walk to that of the structural models remains a valid question even if the ‘true’ model is something other than a random walk. For this reason the DM statistic is also considered, as it compares the ‘raw’ – or unadjusted – root MSFE from the structural model directly to that of the naïve random walk model. A DM statistic greater than zero (less than zero) indicates that the structural model has a lower (higher) MSFE and so produces superior (inferior) forecasts, compared to the random walk model.
While both the CW and DM statistics have standard normal asymptotic distributions, bootstrapped distributions are also constructed given the small sample size. Distributions for the CW and DM statistics are constructed using a semi-parametric residual bootstrapping technique. The technique closely follows that of Mark and Sul (2001), which is also employed in a number of other papers, including Rogoff and Stavrakeva (2008). The p-values for both the CW and DM statistics are defined as the proportion of the distribution above the ‘observed’ statistic.
The out-of-sample assessment is conducted on the baseline model, the four decomposed ToT specifications, the two I/GDP models, and the two models that include short- and long-term RIRD.^{[50]} Rolling windows are used, consistent with much of the literature, with a window length of 70 quarters.^{[51]} The statistics are calculated for forecast horizons of one, four and sixteen quarters, and are reported in Tables A1 and A2.
Horizon | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
One-quarter | Four-quarter | Sixteen-quarter | |||||||||
CW | p-value^{(a)} | p-value^{(b)} | CW | p-value^{(a)} | p-value^{(b)} | CW | p-value^{(a)} | p-value^{(b)} | |||
Baseline | 1.96 | 0.03 | 0.04 | 2.23 | 0.01 | 0.09 | 3.10 | 0.00 | 0.20 | ||
Unweighted narrow | 1.97 | 0.02 | 0.04 | 2.03 | 0.02 | 0.11 | 1.93 | 0.03 | 0.33 | ||
Weighted narrow | 1.98 | 0.02 | 0.05 | 2.14 | 0.02 | 0.10 | 2.49 | 0.01 | 0.27 | ||
Unweighted broad | 1.96 | 0.02 | 0.05 | 2.20 | 0.01 | 0.09 | 3.19 | 0.00 | 0.20 | ||
Weighted broad | 1.96 | 0.03 | 0.04 | 2.32 | 0.01 | 0.08 | 3.28 | 0.00 | 0.19 | ||
Investment | 1.97 | 0.02 | 0.04 | 2.11 | 0.02 | 0.10 | 2.89 | 0.00 | 0.24 | ||
WYTBD | 1.80 | 0.04 | 0.05 | 1.94 | 0.03 | 0.11 | 1.27 | 0.10 | 0.37 | ||
Backward-looking RIRD | 2.02 | 0.02 | 0.05 | 2.19 | 0.01 | 0.12 | 3.14 | 0.00 | 0.21 | ||
Forward-looking RIRD | 1.81 | 0.03 | 0.07 | 1.98 | 0.02 | 0.10 | 2.48 | 0.01 | 0.28 | ||
Notes: (a) Using standard normal distribution (b) Using bootstrapped distribution |
Horizon | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
One-quarter | Four-quarter | Sixteen-quarter | |||||||||
DM | p-value^{(a)} | p-value^{(b)} | DM | p-value^{(a)} | p-value^{(b)} | DM | p-value^{(a)} | p-value^{(b)} | |||
Baseline | 1.70 | 0.04 | 0.00 | 1.90 | 0.03 | 0.01 | 2.51 | 0.01 | 0.12 | ||
Unweighted narrow | 1.65 | 0.05 | 0.00 | 1.59 | 0.06 | 0.02 | 1.52 | 0.06 | 0.18 | ||
Weighted narrow | 1.69 | 0.05 | 0.00 | 1.66 | 0.05 | 0.02 | 1.96 | 0.03 | 0.15 | ||
Unweighted broad | 1.68 | 0.05 | 0.00 | 1.77 | 0.04 | 0.01 | 2.41 | 0.01 | 0.13 | ||
Weighted broad | 1.68 | 0.05 | 0.00 | 1.79 | 0.04 | 0.02 | 2.06 | 0.02 | 0.14 | ||
Investment | 1.70 | 0.04 | 0.00 | 2.01 | 0.02 | 0.01 | 2.83 | 0.00 | 0.10 | ||
WYTBD | 1.53 | 0.06 | 0.00 | 1.32 | 0.09 | 0.01 | −0.08 | 0.53 | 0.24 | ||
Backward-looking RIRD | 1.72 | 0.04 | 0.00 | 1.84 | 0.03 | 0.03 | 3.01 | 0.00 | 0.10 | ||
Forward-looking RIRD | 1.60 | 0.06 | 0.00 | 1.65 | 0.05 | 0.03 | 1.89 | 0.03 | 0.19 | ||
Notes: (a) Using standard normal distribution (b) Using bootstrapped distribution |
One drawback of this bootstrapping technique, and relatedly the method for calculating the forecast statistics, is that it involves estimating the cointegrating relationship over the full sample period. As a result, information from the full sample is used in constructing the forecasts, which could provide the structural models with an ‘unfair’ advantage. Therefore, as a robustness check, bootstrapped distributions are also constructed using a fairly standard residual bootstrap, and carried out under the null hypothesis of no predictability.^{[52]} This approach involves estimating the cointegrating relationship over rolling windows, rather than over the full sample, when constructing the forecast statistics and their bootstrapped distributions. Alquist and Chinn (2008) contend that estimating the cointegrating relationship over rolling windows should ensure that the forecasts are true ex ante predictions and should make the hypothesis tests more stringent.
At both the one- and four-quarter horizons, the results of this exercise are broadly similar to those obtained when estimating the cointegrating relationship over the full sample (Tables A3 and A4). In contrast, the models' forecast performance at the sixteen-quarter horizon appears slightly better when examined using the rolling-window bootstrap methodology. However, it is difficult to draw any strong conclusions given that the results differ based on both the choice of model and test statistic.
Horizon | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
One-quarter | Four-quarter | Sixteen-quarter | |||||||||
CW | p-value^{(a)} | p-value^{(b)} | CW | p-value^{(a)} | p-value^{(b)} | CW | p-value^{(a)} | p-value^{(b)} | |||
Baseline | 2.07 | 0.02 | 0.03 | 2.17 | 0.01 | 0.09 | 2.74 | 0.00 | 0.18 | ||
Unweighted narrow | 2.09 | 0.02 | 0.03 | 2.12 | 0.02 | 0.09 | 3.20 | 0.00 | 0.08 | ||
Weighted narrow | 2.03 | 0.02 | 0.03 | 1.91 | 0.03 | 0.11 | 2.19 | 0.01 | 0.25 | ||
Unweighted broad | 1.98 | 0.04 | 0.04 | 2.43 | 0.01 | 0.04 | 3.49 | 0.00 | 0.03 | ||
Weighted broad | 1.79 | 0.04 | 0.05 | 2.16 | 0.02 | 0.05 | 1.91 | 0.03 | 0.23 | ||
Investment | 2.15 | 0.02 | 0.03 | 2.33 | 0.01 | 0.06 | 2.78 | 0.00 | 0.17 | ||
WYTBD | 2.38 | 0.01 | 0.02 | 2.20 | 0.01 | 0.05 | 3.45 | 0.00 | 0.06 | ||
Backward-looking RIRD | 2.10 | 0.02 | 0.02 | 2.05 | 0.02 | 0.08 | 2.30 | 0.01 | 0.15 | ||
Forward-looking RIRD | 2.08 | 0.02 | 0.02 | 2.11 | 0.02 | 0.07 | 3.00 | 0.00 | 0.12 | ||
Notes: (a) Using standard normal distribution (b) Using bootstrapped distribution |
Horizon | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
One-quarter | Four-quarter | Sixteen-quarter | |||||||||
DM | p-value^{(a)} | p-value^{(b)} | DM | p-value^{(a)} | p-value^{(b)} | DM | p-value^{(a)} | p-value^{(b)} | |||
Baseline | 1.69 | 0.05 | 0.00 | 2.02 | 0.02 | 0.00 | 2.59 | 0.00 | 0.04 | ||
Unweighted narrow | 1.59 | 0.06 | 0.00 | 1.58 | 0.06 | 0.00 | 2.48 | 0.01 | 0.02 | ||
Weighted narrow | 1.37 | 0.09 | 0.00 | 1.18 | 0.12 | 0.00 | 1.16 | 0.12 | 0.09 | ||
Unweighted broad | 1.53 | 0.06 | 0.00 | 1.11 | 0.13 | 0.01 | 1.43 | 0.08 | 0.03 | ||
Weighted broad | 1.57 | 0.06 | 0.00 | 1.81 | 0.04 | 0.00 | 1.33 | 0.09 | 0.06 | ||
Investment | 1.87 | 0.04 | 0.00 | 2.06 | 0.02 | 0.00 | 2.88 | 0.00 | 0.03 | ||
WYTBD | 1.19 | 0.12 | 0.00 | −0.12 | 0.55 | 0.02 | −2.31 | 0.99 | 0.41 | ||
Backward-looking RIRD | 1.65 | 0.05 | 0.00 | 1.33 | 0.09 | 0.00 | 0.03 | 0.49 | 0.12 | ||
Forward-looking RIRD | 1.59 | 0.06 | 0.00 | 2.01 | 0.02 | 0.00 | 2.68 | 0.00 | 0.02 | ||
Notes: (a) Using standard normal distribution (b) Using bootstrapped distribution |
Footnotes
A model is nested in another model if it can be seen as a special case of the more general model (i.e. if it can be obtained by applying restrictions to the parameters of the general model). [47]
Clark and West (2007) extend this to the case where the nested model is not a random walk. [48]
‘Bias’ refers to scale bias, not location bias. See Rogoff and Stavrakeva (2008) for details. [49]
Out-of-sample testing could not be carried out on the FToT model due to the short sample. [50]
As a robustness check, the out-of-sample forecast testing was also carried out using recursive regressions. The results were broadly similar and are therefore not reported. [51]
This method does not impose cointegration. Therefore, it could also be useful in identifying whether the imposition of cointegration as part of the first bootstrapping methodology leads to a bias in the results. [52]