RDP 2005-07: The Australian Business Cycle: A Coincident Indicator Approach 4. Data and Estimation

The composition of the data panel is crucial when estimating a factor model. If the panel contains a disproportionate number of variables from a particular part of the economy, for example the traded goods sector or the labour market, then the factors are likely to bear a closer resemblance to that part of the economy than the overall economy. In compiling the panel of data used in this study, we take care to avoid having too many similar series, and ensuring that, as far as possible, a wide range of variables (for example, from the expenditure, production and income sides of the economy) are included.

The coincident indices are estimated over two sample periods. For the period September 1960 to December 2004 we estimate the indices with quarterly data using a balanced panel containing 25 series (for brevity, we refer to this as the 1960–2004 sample). We estimate monthly coincident indices over a shorter period, January 1980 to December 2004, as there are insufficient monthly series over the longer sample period. The monthly coincident indices are estimated using a balanced panel of 29 series. The number of various types of economic series contained in the monthly and quarterly panels is shown in Table 1. We also undertake robustness analysis in which we estimate the indices using broader panels that are either unbalanced or have a shorter time span, and include up to 111 series. All series are transformed to make them stationary; for most series, this involves using log differences. Appendix A contains a full list of the series in each panel and their sources, and indicates how they are transformed.

Table 1: Composition of Data Panels
Number of series in each category of economic series
  Quarterly 1960–2004 Monthly 1980–2004
National accounts 6 0
Employment 2 6
Industrial production 4 0
Building and CAPEX 2 3
Internal trade 1 2
Overseas transactions 4 7
Prices 4 2
Private finance 2 7
Government finance 0 2
Total 25 29

Most earlier studies that estimate approximate factor models have used data for the US or Europe, where there are literally hundreds of suitable data series, so they have typically used over 100 series and even up to 450 series. While there are many hundreds (if not thousands) of economic time series in Australia, many of these are not suitable for this study, either because their histories are too short, they have too many missing observations, or they duplicate other available series. Some other series are excluded to ensure that the panel has a reasonable balance across different categories of economic variables.

However, using a smaller panel may not necessarily lead to less accurate estimates of the business cycle. Boivin and Ng (forthcoming) argue that adding additional series to a panel need not improve the factor estimates if the additional series are noisy or have correlated errors. In previous applications, larger panels have typically been obtained by disaggregating series into their sectoral or regional components (for example, employment in different industries, or housing approvals in particular areas). Such series are likely to contain more idiosyncratic noise, and are likely to have correlated idiosyncratic components. Indeed, Boivin and Ng find that the factors from a panel with as few as 40 series sometimes produce more accurate forecasts than those derived from a panel of 147 series. Watson (2001) also finds that the marginal improvement in forecasting performance from using greater than 50 series is very small. And Inklaar et al (2003) find that they can produce an index that closely matches the EuroCOIN index using a subset of just 38 of the 246 series that are used in constructing the EuroCOIN index.