The Causal Review

1 Introduction

In 1854, a physician named John Snow did something remarkable. Faced with a cholera epidemic in London's Soho district, he could not randomly assign people to drink from different water pumps. Instead, he exploited the fact that two water companies happened to supply different parts of the city one drawing from a heavily contaminated section of the Thames, the other from a cleaner source upstream. The geography of supply was, in important respects, as-good-as-random with respect to other determinants of cholera mortality. By comparing death rates across customers of the two companies, Snow was able to conclude that contaminated water caused cholera [Snow, 1855].

Snow never used the term "natural experiment." But he pioneered the idea: when nature, history, or policy creates variation in a treatment of interest that is plausibly exogenous independent of the confounding factors that plague observational comparisons researchers can extract credible causal estimates without ever running a controlled trial.

This feature explores what natural experiments are, why they are so valuable, how they work in practice, and where their limitations lie. The proliferation of natural experiments since the 1990s has transformed empirical social science. Understanding them is essential for anyone who reads or produces causal research.

2 The Identification Problem

To understand why natural experiments matter, we must first appreciate the fundamental obstacle they are designed to circumvent. Suppose we wish to estimate the causal effect of military service on lifetime earnings. We could compare veterans' wages to non-veterans' wages. But this comparison is likely to be confounded: men who enlisted may differ systematically from those who did not in health, patriotism, economic opportunity, education, and a dozen other ways. A naive OLS regression of earnings on a veteran indicator estimates

τ̂^OLS = 𝔼[Y_i | D_i = 1] − 𝔼[Y_i | D_i = 0],

(1)

which conflates the treatment effect with selection bias E[Yᵢ(0) | Dᵢ = 1] − E[Yᵢ(0) | Dᵢ = 0]. The second term reflects how veterans' potential non-service earnings would have differed from non-veterans' a quantity we cannot directly observe.

Randomisation solves this by making Dᵢ independent of potential outcomes. A natural experiment seeks an approximation: an external force Zᵢ that shifts exposure to the treatment but is itself independent (or as-good-as-independent) of confounders. This is the essence of an instrumental variable [Angrist and Pischke, 2009].

3 What Makes Something a Natural Experiment?

Dunning [2012] defines natural experiments as "studies in which some process outside the control of the investigator assigns subjects to treatment and control conditions in a manner that is as-good-as-random." Three features distinguish strong natural experiments:

(i) Exogenous variation. The assignment mechanism must be plausibly independent of potential outcomes. Lottery-based assignment (draft lotteries, randomised audits, judicial assignments) offers the strongest guarantee. Policy discontinuities, geographic boundaries, and administrative cutoffs can also provide credible quasi-randomisation.

(ii) Relevance. The exogenous variation must actually affect the treatment. A weak instrument one that shifts treatment only slightly produces estimates that are imprecise or, worse, highly sensitive to small departures from the exclusion restriction [Staiger and Stock, 1997].

(iii) Exclusion. The instrument must affect the outcome only through the treatment of interest, not through other channels. This assumption the exclusion restriction is untestable and must be defended on substantive grounds.

3.1 A Taxonomy

Natural experiments come in several varieties. Rosenzweig and Wolpin [2000] and Meyer [1995] provide useful taxonomies:

Lottery-based. The Vietnam-era draft lottery assigned men to risk of induction by randomly assigning birth-date lottery numbers. Angrist [1990] used lottery number as an instrument for veteran status, estimating that military service reduced civilian earnings by approximately 15% for white men. The randomisation makes the exclusion restriction highly credible.

Policy discontinuities and thresholds. Many policies operate through sharp cutoffs: students above a test-score threshold qualify for gifted programmes; firms above an employment threshold face stricter regulations; municipalities just above a population cutoff receive different intergovernmental transfers. Near the cutoff, assignment is approximately random. Imbens and Lemieux [2008] formalise this as the regression discontinuity (RD) design.

Geographic and institutional boundaries. Card's study of the Mariel boatlift [1990] exploits the fact that 125,000 Cuban refugees arrived in Miami in 1980, creating a sudden, unexpected labour supply shock. Because the timing and destination were determined by Cuban political events rather than Miami's labour market conditions, the variation is plausibly exogenous.

Staggered policy rollouts. When a law or programme is adopted at different times by different states or firms, the staggered adoption can be exploited in a difference-in-differences (DiD) framework. Early adopters serve as controls for later adopters before their own adoption provided the parallel trends assumption holds.

Weather, natural disasters, and other acts of nature. Rainfall affects agricultural income; hurricanes destroy infrastructure; frost dates constrain growing seasons. When these natural shocks are orthogonal to counterfactual trends, they can identify causal effects in settings where designed experiments are impossible.

(Figure 1: The structure of a natural experiment as an instrumental variable. The instrument Z affects the outcome Y only through treatment D, with no direct path the exclusion restriction. Confounders U are unrelated to Z.)

4 Landmark Examples

4.1 The Draft Lottery and Returns to Military Service

Angrist [1990]'s study of the Vietnam-era draft lottery is a textbook example of a lottery-based natural experiment. The Selective Service System assigned draft eligibility by randomly drawing birth dates; men with low lottery numbers (high risk of induction) were far more likely to serve. Crucially, lottery numbers were assigned randomly, so they are independent of potential earnings.

Using lottery number as an instrument for veteran status in a two-stage least squares (2SLS) regression, Angrist estimated the effect of military service on earnings. The 2SLS estimator identifies the local average treatment effect (LATE): the average effect for compliers those induced into service by a low lottery number. Angrist found a 15% reduction in white male veterans' earnings relative to comparable non-veterans, suggesting that military service disrupted civilian career trajectories.

4.2 The Mariel Boatlift and Immigration

Card [1990] studied the effect of immigration on wages by exploiting the sudden arrival of 125,000 Cubans in Miami in 1980. Card compared wages and unemployment in Miami before and after the boatlift to trends in a set of comparison cities (Atlanta, Los Angeles, Houston, Tampa-St. Petersburg), chosen because their demographic composition and economic trends were similar to Miami's prior to 1980.

The identification relies on the boatlift being exogenous to Miami's labour market: Fidel Castro's decision to allow emigration was a political event unrelated to Miami wage trends. Card found remarkably small effects on wages or unemployment, even among low-skilled workers a result that challenged simple supply-and-demand predictions and sparked decades of debate (including a reanalysis by Borjas 2017 that drew different conclusions by changing the comparison group and skill classification).

4.3 Electoral Regression Discontinuity

Lee [2008] studied whether winning an election gives incumbents a causal advantage in future elections. The key challenge is that better candidates win elections and also win future elections, creating a spurious correlation. Lee's insight: in very close elections, the winner is essentially chosen by chance a few hundred votes separate candidates who might otherwise be nearly identical. By comparing the electoral futures of candidates who barely won to those who barely lost, he found a substantial incumbency advantage. This is a regression discontinuity design, with vote share as the running variable and 50% as the cutoff.

5 Current Debates

5.1 Instrument Validity: How Credible Is "As-Good-As-Random"?

A central critique of natural experiments is that the exclusion restriction is often untestable and sometimes implausible. The instrument may affect the outcome through multiple channels.

Consider using distance to college as an instrument for educational attainment: distance also affects peer networks, family migration patterns, and local labour markets all of which may independently affect earnings [Card, 1995]. Defenders argue that the plausibility of exclusion can be assessed from institutional knowledge and falsification tests, but critics argue that such defences are often post-hoc rationalisations.

5.2 Weak Instruments

Even if exclusion holds, weak instruments create severe problems. When the first-stage F-statistic is low, 2SLS estimates are biased toward OLS and confidence intervals are unreliable. Staiger and Stock [1997] showed that rule-of-thumb F > 10 thresholds identify settings where standard inference breaks down. Stock and Yogo [2005] formalise this with critical values for the Cragg-Donald F-statistic. Recent work advocates for weak-instrument-robust inference methods such as the Anderson-Rubin test [Anderson and Rubin, 1949].

5.3 Local vs. General Equilibrium

Natural experiments typically identify effects for a specific population at a specific time the LATE for compliers near the margin of treatment. These estimates may not generalise to other populations, time periods, or policy scales. Scaling up a programme identified from a small natural experiment may trigger general equilibrium effects wage responses, crowd-out, behavioural changes that are absent in the original study. Heckman [1997] argues that this "local" character of natural experiment estimates limits their policy relevance. Imbens [2010] offers a defence: local effects are still informative, and extrapolation requires additional assumptions that structural models also require.

5.4 The Reanalysis Problem

A troubling feature of natural experiments is sensitivity to researcher choices. The Mariel boatlift controversy illustrates this starkly: Borjas [2017] found large negative wage effects by restricting the comparison group and redefining skill categories, while Peri and Yasenov [2019] and others defended Card's original conclusions. When results hinge on undisclosed or arbitrary choices, the credibility advantage of natural experiments over OLS may be overstated. Pre-registration of analysis plans is one proposed solution.

6 Implications for Applied Research

Natural experiments have transformed social science by making credible causal inference feasible in settings where experiments are impossible. They have produced some of the most important findings in economics: returns to education, effects of immigration, consequences of incarceration, impacts of healthcare coverage.

But they require sustained institutional knowledge and creativity to find. Not every question has a natural experiment lurking nearby. And when researchers search hard enough for suitable instruments, the risk of finding spurious or weak instruments increases a form of publication bias specific to the design-based approach.

The best natural experiment papers share common features: they describe the assignment mechanism in detail, they provide rich evidence that the exclusion restriction is plausible, they check whether the instrument is actually strong, and they are transparent about the population for which the LATE is defined. Done well, a natural experiment can provide some of the most convincing causal evidence available short of a randomised trial.

7 Conclusion

From John Snow's cholera map to the Vietnam draft lottery to electoral regression discontinuities, natural experiments have demonstrated that rigorous causal inference is possible even when randomisation is ethically or logistically impossible. The key is finding variation that nature, history, or policy creates and that is plausibly independent of confounding factors.

This is an art as much as a science. It requires deep knowledge of the institutional context, careful scrutiny of the exclusion restriction, and honest acknowledgement of the population for which results are valid. Natural experiments do not eliminate all inferential problems, but they have pushed empirical social science closer to the credibility it needs to inform policy and practice.

References

Anderson, T. W. and Rubin, H. (1949). Estimation of the parameters of a single equation in a complete system of stochastic equations. Annals of Mathematical Statistics, 20(1):46-63.
Angrist, J. D. (1990). Lifetime earnings and the Vietnam era draft lottery: Evidence from Social Security administrative records. American Economic Review, 80(3):313-336.
Angrist, J. D. and Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion. Princeton University Press, Princeton, NJ.
Borjas, G. J. (2017). The wage impact of the Marielitos: A reappraisal. ILR Review, 70(5):1077-1110.
Card, D. (1990). The impact of the Mariel boatlift on the Miami labor market. ILR Review, 43(2):245-257.
Card, D. (1995). Using geographic variation in college proximity to estimate the return to schooling. In Christofides, L., Grant, E., and Swidinsky, R., editors, Aspects of Labour Market Behaviour: Essays in Honour of John Vanderkamp. University of Toronto Press, Toronto.
Dunning, T. (2012). Natural Experiments in the Social Sciences: A Design-Based Approach. Cambridge University Press, Cambridge. Heckman, J. J. (1997). Instrumental variables: A study of implicit behavioral assumptions used in making program evaluations. Journal of Human Resources, 32(3):441-462.
Imbens, G. W. (2010). Better LATE than nothing: Some comments on Deaton (2009) and Heckman and Urzua (2009). Journal of Economic Literature, 48(2):399-423.
Imbens, G. W. and Lemieux, T. (2008). Regression discontinuity designs: A guide to practice. Journal of Econometrics, 142(2):615-635.
Lee, D. S. (2008). Randomized experiments from non-random selection in U.S. House elections. Journal of Econometrics, 142(2):675-697.
Meyer, B. D. (1995). Natural and quasi-experiments in economics. Journal of Business & Economic Statistics, 13(2):151-161.
Peri, G. and Yasenov, V. (2019). The labor market effects of a refugee wave: Applying the synthetic control method to the Mariel boatlift. Journal of Human Resources, 54(2):267-309.
Rosenzweig, M. R. and Wolpin, K. I. (2000). Natural "natural experiments" in economics. Journal of Economic Literature, 38(4):827-874.
Snow, J. (1855). On the Mode of Communication of Cholera, 2nd edition. John Churchill, London.
Staiger, D. and Stock, J. H. (1997). Instrumental variables regression with weak instruments. Econometrica, 65(3):557-586.
Stock, J. H. and Yogo, M. (2005). Testing for weak instruments in linear IV regression. In Andrews, D. and Stock, J., editors, Identification and Inference for Econometric Models. Cambridge University Press, Cambridge.

Natural Experiments: Finding Causal Evidence Without Randomisation

1 Introduction

2 The Identification Problem

3 What Makes Something a Natural Experiment?

3.1 A Taxonomy

4 Landmark Examples

4.1 The Draft Lottery and Returns to Military Service

4.2 The Mariel Boatlift and Immigration

4.3 Electoral Regression Discontinuity

5 Current Debates

5.1 Instrument Validity: How Credible Is "As-Good-As-Random"?

5.2 Weak Instruments

5.3 Local vs. General Equilibrium

5.4 The Reanalysis Problem

6 Implications for Applied Research

7 Conclusion

References

Continue Reading

The ivmte Package in R: Marginal Treatment Effects and Bounding Policy-Relevant Parameters

The contdid Package in R: Estimating Dose-Response Functions with Continuous Treatments

Recent Results: Housing Markets, Rent Control, and Urban Economics

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Article Title

Natural Experiments: Finding Causal Evidence Without Randomisation

1 Introduction

2 The Identification Problem

3 What Makes Something a Natural Experiment?

3.1 A Taxonomy

4 Landmark Examples

4.1 The Draft Lottery and Returns to Military Service

4.2 The Mariel Boatlift and Immigration

4.3 Electoral Regression Discontinuity

5 Current Debates

5.1 Instrument Validity: How Credible Is "As-Good-As-Random"?

5.2 Weak Instruments

5.3 Local vs. General Equilibrium

5.4 The Reanalysis Problem

6 Implications for Applied Research

7 Conclusion

References

Continue Reading

The ivmte Package in R: Marginal Treatment Effects and Bounding Policy-Relevant Parameters

The contdid Package in R: Estimating Dose-Response Functions with Continuous Treatments

Recent Results: Housing Markets, Rent Control, and Urban Economics

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Stay current with causal inference

Article Title