The Causal Review

1 The Debate in Brief

Psychology's replication crisis in which a coordinated effort to replicate 100 published studies reproduced only about 36% of the original findings at conventional significance levels [Open Science Collaboration, 2015] sent shockwaves through the social sciences. Economists have largely responded with a mixture of concern and reassurance: concern because many of the same practices that produced the crisis (selective reporting, flexible analysis, publication bias) are present in economics; reassurance because the design-based approach of applied microeconomics, with its emphasis on transparent identification strate- gies, may provide some protection.

The debate has two intertwined components. First, does economics suffer from the same type of false-discovery inflation as psychology? Second, is pre-registration - publicly committing to a research design and analysis plan before seeing the data the right solution?

2 The Case for Concern: P-hacking and Specification Searching

2.1 The Anatomy of P-hacking

In any dataset, a researcher with flexible discretion over model specification, sample restric- tions, variable definitions, and outcome choices can find a statistically significant result even when none exists. Simmons et al. [2011] famously demonstrated that undisclosed researcher degrees of freedom the range of analysis choices available dramatically inflate false positive rates. A researcher who tries 20 slightly different specifications and reports only the one that yields $p<0.05$ has not found a true effect; they have found noise.

2.2 Evidence of P-hacking in Economics

Brodeur et al. [2016] examined 50,000 tests from a sample of economics papers in top journals and found a suspicious excess of t-statistics just above conventional thresholds (1.64 for 10%, 1.96 for 5%, 2.57 for 1%). This "bunching" just above significance thresholds is the statistical signature of selective reporting researchers who are just below threshold search for specifications that push them over it.

Ioannidis et al. [2017] replicated results from 64 economics studies and found that on average, when effect sizes were re-estimated with more statistical power (bigger samples), they were 85% smaller than originally reported. This "winner's curse" or publication bias effect is consistent with a literature that systematically over-estimates effect sizes because small-sample studies with large (lucky) estimates are more likely to be published.

Christensen and Miguel [2018] survey the evidence on transparency and reproducibility in economics and document several patterns: published results are often difficult to replicate computationally (data or code unavailable), regression tables often cannot be reproduced from the underlying data, and reported significance levels are sensitive to sample restrictions and control variable choices that are never justified in the text.

3 The Case for Reassurance: Design-Based Inference Is Different

Defenders of economics' credibility argue that the design-based approach provides natural protection against the worst forms of p-hacking.

Identification strategies are visible and constraining. When a researcher exploits a specific natural experiment a draft lottery, a geographic boundary, a policy cutoff the identification strategy determines which tests are the primary ones. The first-stage F- statistic, the reduced form, and the 2SLS estimate are the natural objects of interest. There is less scope for arbitrary specification searching than in a study that simply regresses an outcome on a treatment with flexible controls.

Pre-trends and placebo tests are routine. Best practice in DiD, IV, and RD routinely includes falsification tests: pre-trend tests in DiD, first-stage validity checks in IV, density tests and covariate balance in RD. These tests are often required by journals and reviewers, creating a discipline that limits post-hoc data mining.

Robustness to key specification choices. Applied microeconomics papers routinely present results across a range of specifications, bandwidths, and control sets. While this does not eliminate p-hacking, it makes it more transparent and easier for readers to assess.

4 Pre-Analysis Plans: The Proposed Solution

Pre-analysis plans (PAPs) documents registered with a public registry before data collec- tion or analysis begins have emerged as the primary institutional response to researcher degrees of freedom. A PAP specifies in advance: the primary hypothesis, the estimator, the sample, the outcome variables, and how deviations from the plan will be reported.

Evidence from development economics. Development economists have led the adop- tion of PAPs, partly because RCTs in development allow registration before the intervention. Casey et al. [2012] provide one of the first rigorous assessments: they pre-registered the anal- ysis of a community development programme in Sierra Leone and show that PAP-specified results are substantially weaker than the non-pre-registered results, suggesting that flexibility in analysis had inflated published estimates elsewhere.

Olken's assessment. Olken [2015] offers a balanced view: PAPs are valuable for con- firming that a result was not data-mined after the fact, but they cannot solve all problems. Researchers can still p-hack within a PAP if the plan is vague enough. The PAP is most valuable when it specifies a small number of pre-registered primary outcomes and commits to reporting all of them, regardless of significance.

Registered reports. A stronger form of pre-registration is the registered report, in which a journal provides in-principle acceptance of a paper before data collection, based on the quality of the design. This eliminates publication bias for designs that receive in-principle acceptance. Registered reports have been adopted by several psychology journals and are beginning to appear in economics.

5 What the Pre-registration Debate Cannot Resolve

Exploratory research is valuable. Pre-registration is designed for confirmatory research testing a specific hypothesis. Exploratory research discovering new patterns in data is also valuable and should not be suppressed. The solution is to label results as exploratory rather than confirmatory, not to insist that all research be pre-registered.

Natural experiments are often unexpected. A researcher who discovers an unex- pected policy change or natural experiment cannot always pre-register before data analysis. The equivalent safeguard is transparency: publishing the full analysis plan as part of the paper, even if registered after the data were seen.

Specification searching may sometimes be desirable. Looking at data before final- ising analysis is not always bad science it can reveal coding errors, unexpected patterns, and important heterogeneity. The problem is selective reporting, not exploration. Requiring transparent reporting of the full specification search (rather than just the selected result) may be a better solution than banning exploration.

6 What Would Resolve the Debate?

Large-scale replication studies. Systematic efforts to replicate results from top economics journals, similar to the Open Science Collaboration in psychology, would provide direct evidence on false discovery rates.
Comparison of registered vs. non-registered results. As PAPs become more common, meta-analyses comparing effect sizes from pre-registered and non-pre-registered studies in the same domain will reveal the magnitude of researcher degrees of freedom.
Improved data availability. Requiring data and code availability (as the American Economic Review now does) allows post-hoc replication and specification analysis, providing an indirect check on selective reporting.

7 Conclusion

The evidence suggests that economics is not immune to the problems that produced psy- chology's replication crisis p-hacking, selective reporting, and publication bias are present and measurable. The design-based approach provides some protection but is not a com- plete solution.

Pre-registration is a valuable tool for confirmatory research, and its adoption in development economics has already improved credibility. But PAPs are not a panacea: they require vague plans to be refined (allowing post-hoc flexibility) and cannot address the fundamental issue of publication bias.

The most promising path forward combines pre-registration for confirmatory studies, routine code and data availability, transparent reporting of specification searches, and a culture that rewards honest null results as much as surprising positive findings.

References

Brodeur, A., Lé, M., Sangnier, M., and Zylberberg, Y. (2016). Star wars: The empirics strike back. American Economic Journal: Applied Economics, 8(1):1-32.
Casey, K., Glennerster, R., and Miguel, E. (2012). Reshaping institutions: Evidence on aid impacts using a preanalysis plan. Quarterly Journal of Economics, 127(4):1755-1812.
Christensen, G. and Miguel, E. (2018). Transparency, reproducibility, and the credibility of economics research. Journal of Economic Literature, 56(3):920-980.
Ioannidis, J. P. A., Stanley, T. D., and Doucouliagos, H. (2017). The power of bias in economics research. Economic Journal, 127(605):F236-F265.
Nosek, B. A., Ebersole, C. R., DeHaven, A. C., and Mellor, D. T. (2018). The preregistration revolution. Proceedings of the National Academy of Sciences, 115(11):2600-2606.
Olken, B. A. (2015). Promises and perils of pre-analysis plans. Journal of Economic Per- spectives, 29(3):61-80.
Open Science Collaboration (2015). Estimating the reproducibility of psychological science. Science, 349(6251):aac4716.
Simmons, J. P., Nelson, L. D., and Simonsohn, U. (2011). False-positive psychology: Undis- closed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11):1359-1366.

Pre-registration and Researcher Degrees of Freedom: Does Economics Have a Replication Problem?

1 The Debate in Brief

2 The Case for Concern: P-hacking and Specification Searching

2.1 The Anatomy of P-hacking

2.2 Evidence of P-hacking in Economics

3 The Case for Reassurance: Design-Based Inference Is Different

4 Pre-Analysis Plans: The Proposed Solution

5 What the Pre-registration Debate Cannot Resolve

6 What Would Resolve the Debate?

7 Conclusion

References

Continue Reading

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Article Title

Pre-registration and Researcher Degrees of Freedom: Does Economics Have a Replication Problem?

1 The Debate in Brief

2 The Case for Concern: P-hacking and Specification Searching

2.1 The Anatomy of P-hacking

2.2 Evidence of P-hacking in Economics

3 The Case for Reassurance: Design-Based Inference Is Different

4 Pre-Analysis Plans: The Proposed Solution

5 What the Pre-registration Debate Cannot Resolve

6 What Would Resolve the Debate?

7 Conclusion

References

Continue Reading

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Stay current with causal inference

Article Title