RDP 2021-02: Star Wars at Central Banks Appendix A: Credibility Safeguards

This project had the potential to produce controversial results. To avoid putting strain on professional relationships between central banks, we included only the central banks with which co-authors on the project were affiliated. A downside of this approach is that it created a conflict of interest. The conflict could have fostered bias of our own, which would have been deeply hypocritical.

To support the credibility of the work, we have done the following:

  • We have released the data we collected and statistical programs we use.
  • We have swapped central banks in the data collection, so that none of the collectors gathered data from the central banks with which they were affiliated.
  • We have identified who collected each data point in our dataset.
  • We have declared the conflict of interest at the start of this paper.
  • We have publicly registered a pre-analysis plan on the Open Science Framework website www.osf.io (available at <https://doi.org/10.17605/OSF.IO/K3G2F>). We note that pre-analysis plans are rarely used for observational studies because of difficulties in proving that the plans have been written before data analysis. Burlig (2018) lists scenarios in which credibility is still achievable, but we fit poorly into those because our data are freely available. Credibility in our case rests on our having registered the plan before we undertook the large task of compiling the data into a useful format.

There are several parts where we have deviated from our plan:

  • Our language has changed in some important places, in response to feedback. Firstly, our definition of researcher bias is now more precise. Our plan had used the definition ‘a tendency to present the results of key hypothesis tests as having statistical significance when a fair presentation would not have'. Secondly, our plan used the names exploratory and confirmatory for what we now label as reverse causal and forward causal. The new language is less loaded and consistent with other work on the topic. Our descriptions of the concepts are the same though.
  • In our plan, we intended to cleanse the top journals dataset of data-driven model selection and reverse causal research in an automated way, using keywords that we identified as being sensible indicators while collecting the central bank data. But when collecting the data, it became clear that our best keywords would produce too many false signals. We stopped the initiative, believing that even our best efforts would produce results that were not credible. This saved the research team a lot of time.
  • We did not include the placebo test in our pre-analysis plan. We have the necessary data on hand only because, following a suggestion in footnote 19 of Brodeur et al (2016), we had planned to use the controls as a sensible candidate for bias-free P[z]. In the end, the distribution of controls turned out to have too much mass in the tails to meet the informal criteria in Step 2 of the z-curve method. Many aspects of the formal bias estimate were nonsense, including a maximum excess of results at low absolute z. The online appendix contains more detail.
  • In Table 1, we provided more summary statistics than we had in our plan. We had not planned to show breakdowns at an institutional level on anything other than the number of papers and test statistics in our sample. We had also not planned to include the detail about co-authorship. These changes were a response to feedback.
  • When writing our plan, we failed to anticipate that neither of our research question categories – reverse causal and forward causal – would neatly fit hypothesis tests in straight forecasting work. Our categories are also a bit awkward for general equilibrium macroeconometric modelling. There were very few papers of these kinds, and to be conservative, we put them into the reverse causal category (they are thus dropped from the cleansed sample).
  • Though we had planned to show estimated levels of dissemination bias in central bank research, we had not planned to compare them with those of the top journals, as we did in Figure 7. Again, this was a response to feedback.