The Causal Review

1 Introduction

Economics has undergone a quiet revolution. Over the past four decades, the discipline's empirical practice has shifted from descriptive regressions—where identification was assumed rather than demonstrated—toward research designs that isolate causal variation in a way that would satisfy a hard-nosed sceptic. We know this from experience, from textbooks, and from the steady accumulation of Nobel prizes awarded to practitioners of the "credibility revolution." But until recently, we lacked systematic, quantitative evidence on just how profound and widespread the transformation has been.

A new study by Britto et al. [2025] fills that gap. Analysing over 44,000 NBER and CEPR working papers published between 1980 and 2023, they construct a knowledge graph that maps economic concepts and their causal relationships as asserted in the papers themselves. The results are striking: the share of empirical economics papers making explicit causal claims, and the methods they use to support those claims, have changed dramatically. This article unpacks those findings, situates them in the broader intellectual history of econometrics, and asks what the data reveal about the discipline's remaining blind spots.

2 What the Numbers Show

The headline finding of Britto et al. [2025] is that difference-in-differences has become the dominant empirical tool in economics. Its prevalence in NBER working papers rose from approximately 10.5 per cent in 1980 to 20.3 per cent in 2023. Instrumental variables usage roughly doubled over the same period, and regression discontinuity designs—virtually absent before 2000—now appear in around 8 per cent of empirical papers.

Perhaps equally telling is the adoption pattern across subfields. Econometrics (unsurprisingly) and labour economics (the cradle of the credibility revolution) led the charge. But the shift has spread to finance, health economics, industrial organisation, and development economics. Even macroeconomics, long a holdout dominated by structural models and VARs, has seen rising use of identification-based methods—a trend documented separately by Nakamura and Steinsson [2018].

The story is not simply one of diffusion from a methodological vanguard. It is also a story of what methods displaced. Ordinary least squares regressions with limited discussion of identification have declined as a share of the empirical literature, even as the raw number of empirical papers has grown. The profession has collectively raised the bar for what counts as credible causal evidence.

3 The Credibility Revolution in Context

To understand what the data capture, it is worth recalling the intellectual origins of the shift. Angrist and Pischke [2010] date the modern era to a cluster of developments in the late 1980s and 1990s: the LATE theorem of Imbens and Angrist [1994], the revival of the regression discontinuity design by Hahn et al. [2001], and a series of celebrated natural experiments—Card and Krueger on minimum wages, Angrist on the Vietnam draft, Card on the Mariel boatlift—that demonstrated the power of quasi-experimental variation.

These papers shared a common epistemological stance: the question of what identifies the parameter of interest should be front and centre, not buried in a footnote. The credibility revolution was not primarily about new estimators—the Wald estimator (IV) had been in use since Wright [1928]. It was about a new norm of transparency regarding identification assumptions.

What the Britto et al. data add is a quantitative portrait of how that norm diffused. The diffusion was neither instantaneous nor uniform. Field journals in applied micro—the American Economic Review, the Journal of Political Economy, the Quarterly Journal of Economics—adopted the new norms early and pushed them hard through editorial selection. Top-five publication pressure then transmitted the norms downward through hiring and graduate training.

4 Three Frameworks, One Revolution

One of the most intellectually interesting aspects of the meta-science data is what they reveal about the relationship between the three main frameworks for causal inference: the potential outcomes (Rubin causal model) framework, Pearl's directed acyclic graphs (DAGs), and the classical econometric structural approach.

Imbens [2020] observed that economists and statisticians use fundamentally different languages for causal reasoning—the former preferring potential outcomes and instrumental variables, the latter preferring DAGs and d-separation. The meta-science data largely confirm this divide: DAG-based language is rare in economics papers, even as it dominates in epidemiology and statistics. The credibility revolution, in economics at least, was won by the potential outcomes camp.

Yet the three frameworks are not fundamentally incompatible, as Imbens [2025] argue. Each framework has comparative advantages: the PO framework is natural for treatment-effect questions with clean randomisation; DAGs are powerful for identifying adjustment sets and detecting collider bias; the structural approach is essential whenever the question involves counterfactual policies that change the economic environment itself—the Lucas critique territory. The meta-science data suggest economists have largely settled on the PO approach for reduced-form work, while retaining structural methods for macro and industrial organisation applications.

5 What the Data Cannot Tell Us: The Quality Problem

The Britto et al. data count papers using causal methods, but they cannot easily assess the quality of identification in each paper. This is an important limitation. The proliferation of DiD papers does not imply that all of them have valid parallel trends assumptions; the rise in IV papers does not mean every instrument satisfies exclusion.

In fact, there are reasons to worry that the success of the credibility revolution has bred new problems. The credentialing of causal language—if you can label your paper a "natural experiment," you pass a lower editorial bar than if you run an OLS regression—may have created incentives to oversell identification. Brodeur et al. [2016] document a suspicious excess of p-values just below 0.05 in economics papers, consistent with specification searching. Andrews and Stock [2019] show that standard IV inference is unreliable when instruments are weak, yet weak instruments remain common in published work.

The deeper issue is that the credibility revolution changed the vocabulary of identification without always changing the underlying practice. A DiD paper that flags pre-trends and uses Callaway-Sant'Anna estimators is genuinely credible; a DiD paper that runs a single TWFE regression and notes "parallel trends holds approximately" is less so.

6 What Has Not Changed

Despite the revolution, several important features of economic empirics have proved resistant to change. Britto et al. [2025] note that:

External validity receives far less attention than internal validity. Papers identify local average treatment effects for specific complier populations but rarely ask whether those effects generalise to the populations that policy actually targets.

General equilibrium effects are rarely estimated. Most quasi-experimental designs are local: they identify what happens to one firm, one individual, one county when something changes. They cannot, in general, estimate the economy-wide effects that would arise if a policy were scaled.

Mechanisms are underexplored. Identifying that X causes Y is a first step; understanding why requires a structural model or a design that can isolate channels.

These are not criticisms of the credibility revolution—they are reminders of what it was never designed to do. The revolution made economics better at answering "does this policy have an effect here?" It did not claim to answer "would this policy work everywhere, and through what channels?"

7 The Road Ahead

What does the future of causal econometrics look like, based on current trajectories? Several developments stand out:

Staggered DiD is being rationalised. The proliferation of papers identifying problems with two-way fixed effects estimators suggests that the field is working through the implications of treatment effect heterogeneity in panel data. The next decade will likely see a consolidation around a smaller set of heterogeneity-robust estimators.

Causal machine learning is maturing. The DML approach of Chernozhukov et al. [2018] and the causal forest of Wager and Athey [2018] are being adopted in applied work with increasing frequency. The meta-science data do not yet capture this trend fully, but the trajectory is clear.

Sensitivity analysis is becoming standard. Tools like the Rambachan-Roth honest DiD and Rosenbaum bounds for matching studies are beginning to appear routinely in applied papers. This is a healthy development: it shifts the question from "does identification hold?" (untestable) to "how much would identification need to fail for our conclusions to change?" (answerable).

8 Conclusion

The Britto et al. meta-science study provides the first comprehensive, quantitative portrait of the credibility revolution in economics. The numbers confirm what practitioners sensed: the discipline has been transformed. DiD is now the most common identification strategy; IV and RDD have also grown substantially; and the norm of explicit identification has spread from labour economics to virtually every empirical subfield. But the numbers also counsel humility. Counting causal claims is not the same as validating them. The credibility revolution succeeded in changing the vocabulary of economics; the harder task—ensuring that the vocabulary reliably maps onto genuine knowledge—is ongoing.

References

Andrews, I. and Stock, J. H. Identification, weak instruments, and statistical inference ineconometrics. Canadian Journal of Economics, 52(2):379–407, 2019.
Angrist, J. D. and Pischke, J.-S. The credibility revolution in empirical economics: Howbetter research design is taking the con out of econometrics. Journal of Economic Perspectives, 24(2):3–30, 2010.
Brodeur, A., L´e, M., Sangnier, M., and Zylberberg, Y. Star wars: The empirics strike back.American Economic Journal: Applied Economics, 8(1):1–32, 2016.
Callaway, B. and Sant’Anna, P. H. C. Difference-in-differences with multiple time periods.Journal of Econometrics, 225(2):200–230, 2021.
Britto, D. G. C., Fenizia, A., and Kline, P. Causal claims in economics. arXiv preprintarXiv:2501.06873, 2025

The Rise of Causal Inference in Economics: A Meta-Science Perspective

1 Introduction

2 What the Numbers Show

3 The Credibility Revolution in Context

4 Three Frameworks, One Revolution

5 What the Data Cannot Tell Us: The Quality Problem

6 What Has Not Changed

7 The Road Ahead

8 Conclusion

References

Continue Reading

The ivmte Package in R: Marginal Treatment Effects and Bounding Policy-Relevant Parameters

The contdid Package in R: Estimating Dose-Response Functions with Continuous Treatments

Recent Results: Housing Markets, Rent Control, and Urban Economics

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Article Title

The Rise of Causal Inference in Economics: A Meta-Science Perspective

1 Introduction

2 What the Numbers Show

3 The Credibility Revolution in Context

4 Three Frameworks, One Revolution

5 What the Data Cannot Tell Us: The Quality Problem

6 What Has Not Changed

7 The Road Ahead

8 Conclusion

References

Continue Reading

The ivmte Package in R: Marginal Treatment Effects and Bounding Policy-Relevant Parameters

The contdid Package in R: Estimating Dose-Response Functions with Continuous Treatments

Recent Results: Housing Markets, Rent Control, and Urban Economics

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Stay current with causal inference

Article Title