The Causal Review

1 The Question

Consider a simple, important question: does going to university increase earnings? We observe that university graduates earn about 60% more per hour than workers with only a high-school diploma. Can we conclude that university education causes a 60% earnings premium?

Most economists would say no, and the reason is selection bias. People who go to university are not a random sample of the population. They tend to be more academically able, come from wealthier families, and have stronger networks—all of which would boost their earnings even without the degree. The raw earnings gap conflates the causal effect of education with these pre-existing advantages. This article explains precisely what goes wrong and introduces the omitted variable bias formula.

2 The Population Model

Suppose the true structural equation for earnings Yᵢ is:

$$Y_i = \beta_0 + \beta_1 S_i + \beta_2 A_i + u_i, \tag{1}$$

where Sᵢ is years of schooling, Aᵢ is ability (intelligence, work ethic, family connections—all the unobserved traits that affect earnings), and uᵢ is a mean-zero residual independent of both Sᵢ and Aᵢ.

The parameter β₁ is the causal return to schooling: the increase in earnings from one additional year of education, holding ability fixed. This is what we want to estimate.

The problem: we do not observe Aᵢ. We fit the short regression:

$$Y_i = \tilde{\beta}_0 + \tilde{\beta}_1 S_i + \tilde{u}_i. \tag{2}$$

The OLS estimator for β̃₁ converges to:

$$\text{plim } \hat{\beta}_1 = \beta_1 + \beta_2 \cdot \frac{\text{Cov}(S_i, A_i)}{\text{Var}(S_i)} . \tag{3}$$

This is the omitted variable bias (OVB) formula. The short regression estimate is biased by an amount equal to the coefficient on the omitted variable (β₂) multiplied by the coefficient from an auxiliary regression of the omitted variable on the included variable (Cov(Sᵢ, Aᵢ) / Var(Sᵢ) ≡ δ).

3 Decomposing the Bias

The OVB formula in equation (3) can be written as:

$$\text{Bias} = \underbrace{\beta_2}_{\text{effect of omitted}} \times \underbrace{\delta}_{\text{correlation: omitted with included}} . \tag{4}$$

Two conditions must hold simultaneously for OVB to be non-zero:

The omitted variable must affect the outcome: β₂ ≠ 0.
The omitted variable must be correlated with the included variable: δ = Cov(Sᵢ, Aᵢ) / Var(Sᵢ) ≠ 0.

If either condition fails, there is no bias. For the education-earnings example:

Ability clearly affects earnings: β₂ > 0.

Ability is positively correlated with education: δ > 0 (able individuals study longer).

Therefore the bias is positive: the raw OLS estimate β̃₁ overstates the causal return to education. This is the "ability bias" in returns-to-schooling estimates.

A Numerical Illustration

‍Suppose the true return to education is β₁ = 0.06 (6% per year), ability has effect β₂ = 0.10 on log earnings, and the OLS coefficient of ability on schooling is δ = 0.5 years per unit of ability. Then the short-regression bias is 0.10 × 0.5 = 0.05, and:

plim(β̃₁) = 0.06 + 0.05 = 0.11

The raw return appears to be 11% per year—nearly double the true causal effect of 6%.

4 Selection Bias as a Special Case of OVB

‍Selection bias is the OVB that arises when individuals self-select into treatment in a way that is correlated with the outcome. Let Dᵢ ∈ {0, 1} be a binary treatment indicator. The naïve OLS estimator of the treatment effect converges to:

$$\text{plim } \hat{\beta}_D^{\text{OLS}} = \underbrace{\mathbb{E}[Y_i(1) - Y_i(0)]}_{\text{ATE}} + \underbrace{\mathbb{E}[Y_i(0)|D_i = 1] - \mathbb{E}[Y_i(0)|D_i = 0]}_{\text{Selection bias}} . \tag{5}$$

The first term is the average treatment effect—what we want. The second term is selection bias: the difference in counterfactual outcomes between those who select into treatment and those who do not. If those who choose training are more employable even without training (positive selection), OLS overstates the programme effect. If they are "hard cases"—least likely to find employment on their own (negative selection, as in some remediation programmes), OLS understates it.

5 Graphical Illustration: The DAG Perspective

Figure 1 shows the bias using a directed acyclic graph. Ability (A) is a confounder: it causes both the treatment (education) and the outcome (earnings). To estimate the causal effect of S → Y, we must close the back-door path S ← A → Y—either by controlling for A or by using an instrument for S.

Figure 1: DAG for the education-earnings problem. The causal path runs from schooling (S) to earnings (Y) with coefficient β₁. Ability (A) creates a backdoor path S ← A → Y. OLS conflates the causal path with the backdoor path, overstating β₁.

6 Solutions to the OVB Problem

The OVB formula makes clear that there are two ways to eliminate bias:

Control for the confounder. If Aᵢ were observed, including it in the regression gives an unbiased estimate of β₁ (the "long regression"). For this to work, Aᵢ must fully capture all confounding. In practice, unmeasured confounders remain.‍
Use an instrument. An instrumental variable Zᵢ that affects schooling Sᵢ but is unrelated to ability Aᵢ breaks the bias. Card (1995) used proximity to a college as an instrument for education: living near a college increases schooling but does not directly affect earnings.‍
Exploit natural experiments. Differences-in-differences, regression discontinuity, and randomised experiments are all strategies that generate variation in treatment uncorrelated with potential confounders.

7 The Direction and Magnitude of Bias

The OVB formula allows us to reason about the direction of bias even when we cannot estimate it precisely. Some useful cases:

Setting	$\beta_2$ sign	$\delta$ sign	Bias sign	OLS vs truth
Education returns	$+$	$+$	$+$	Overestimate
Drug treatment (sick take drugs)	$-$	$+$	$-$	Underestimate
Job training (motivated enrol)	$+$	$+$	$+$	Overestimate
Remediation (struggling enrol)	$+$	$-$	$-$	Underestimate

8 Common Mistakes

"I controlled for everything important." This claim is almost never defensible. Measurement error in controls and genuine omitted variables are ubiquitous.
‍"My R-squared is high, so omitted variable bias is small." Wrong. A high R-squared means the included variables explain a lot of variance in the outcome. It says nothing about whether the treatment is correlated with omitted variables.
‍Controlling for a mediator. If variable Mᵢ is on the causal path from Dᵢ to Yᵢ (a mediator), controlling for it removes part of the causal effect rather than correcting bias. Only confounders (variables causing both treatment and outcome) should be included.
‍Collider bias. Controlling for a variable that is caused by both treatment and outcome (a collider) opens a spurious association. This is a less obvious but equally dangerous form of bias [Pearl, 2009].

9 Where to Learn More

Angrist and Pischke [2009] Chapter 3 derives the OVB formula clearly and applies it to returns to schooling.

Pearl [2009] develops the DAG framework for understanding confounding and collider bias.

Imbens [2015] provides a unified review of estimating average causal effects under unconfoundedness.

10 Conclusion

Ordinary least squares gives a consistent estimate of the causal effect only when the treatment variable is uncorrelated with the error term—that is, when there are no relevant omitted variables. In observational data, this condition almost always fails. The omitted variable bias formula makes the problem concrete and quantifiable: bias equals the product of the effect of the omitted variable on the outcome and its correlation with the treatment. Correcting for OVB—through randomisation, instruments, or regression discontinuity—is the central goal of modern empirical economics.

References

Angrist, J. D. and Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion. Princeton University Press.
Imbens, G. W. (2015). Matching methods in practice: three examples. Journal of Human Resources, 50(2):373-419.
Pearl, J. (2009). Causality: Models, Reasoning and Inference (2nd ed.). Cambridge University Press[cite: 3].

‍

Selection Bias and Why OLS Can Lie: The Omitted Variable Problem

1 The Question

2 The Population Model

3 Decomposing the Bias

A Numerical Illustration

4 Selection Bias as a Special Case of OVB

5 Graphical Illustration: The DAG Perspective

6 Solutions to the OVB Problem

7 The Direction and Magnitude of Bias

8 Common Mistakes

9 Where to Learn More

10 Conclusion

Continue Reading

The ivmte Package in R: Marginal Treatment Effects and Bounding Policy-Relevant Parameters

The contdid Package in R: Estimating Dose-Response Functions with Continuous Treatments

Recent Results: Housing Markets, Rent Control, and Urban Economics

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Article Title

Selection Bias and Why OLS Can Lie: The Omitted Variable Problem

1 The Question

2 The Population Model

3 Decomposing the Bias

A Numerical Illustration

4 Selection Bias as a Special Case of OVB

5 Graphical Illustration: The DAG Perspective

6 Solutions to the OVB Problem

7 The Direction and Magnitude of Bias

8 Common Mistakes

9 Where to Learn More

10 Conclusion

Continue Reading

The ivmte Package in R: Marginal Treatment Effects and Bounding Policy-Relevant Parameters

The contdid Package in R: Estimating Dose-Response Functions with Continuous Treatments

Recent Results: Housing Markets, Rent Control, and Urban Economics

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Stay current with causal inference

Article Title