The Causal Review

1 The Puzzle of Multiple Regression

Most empirical papers in economics involve a multiple regression: regress an outcome Y on a treatment variable D and a vector of controls X. The estimated coefficient on D is interpreted as the effect of D on Y, "controlling for X."

But what exactly does "controlling for X" mean? What variation in D is the regression using? And what happens when we add or remove controls?

The Frisch-Waugh-Lovell (FWL) theorem [Frisch and Waugh, 1933, Lovell, 1963] provides precise answers to these questions. Often called the regression anatomy formula by Angrist and Pischke [2009], it is one of the most illuminating results in all of econometrics.

2 The Setup

Suppose we want to estimate β in the long regression:‍

Y_i = α + βD_i + γ'X_i + ε_i, (1)

where Dᵢ is our variable of interest and Xᵢ is a vector of controls (e.g., age, gender, education, and so on).

The short regression with no controls would simply be:

Y_i = α̃ + β̃D_i + ε̃_i.

The short regression coefficient β̃ equals the slope of the best linear approximation to Y as a function of D, ignoring X. Is β̂ from the long regression just β̃ with some adjustment for X? Yes— but the adjustment is illuminating.

3 The Frisch-Waugh-Lovell Theorem

The FWL theorem states: the OLS estimate β̂ from the long regression (1) is identical to the OLS coefficient from the partialled-out regression: ‍

Ỹ_i = βD̃_i + error_i, (2)

where D̃ᵢ and Ỹᵢ are the residuals from regressing Dᵢ and Yᵢ on Xᵢ, respectively. In other words:

Step 1. Regress Dᵢ on Xᵢ using OLS. Save the residuals: D̃ᵢ = Dᵢ − D̂ᵢ, where D̂ᵢ = â + b̂′ Xᵢ

Step 2. Regress Yᵢ on Xᵢ using OLS. Save the residuals: Ỹᵢ = Yᵢ − Ŷᵢ

Step 3. Regress Ỹᵢ on D̃ᵢ with no constant. The coefficient on D̃ᵢ equals β̂ from the long regression exactly.

The formula for β̂ from step 3 is: ‍

^β =

∑i

D̃_iỸ_i

∑i

D̃_i²

Cov(D̃_i, Ỹ_i)

Var(D̃_i)

. (3)

Equation (3) is the regression anatomy formula: β is the coefficient from a simple regression of Y on the part of D that is orthogonal to (unpredictable from) X.

4 What Does "Controlling for X" Really Mean?

The FWL theorem gives a precise answer: adding X to the regression is equivalent to using only the variation in D that cannot be explained by X.

This has several important implications:

4.1 Controls Remove Variation

When we add controls X, we discard the variation in D that is attributable to X. If X explains a lot of the variation in D (high R² in the first-step regression of D on X), then D̃ has very little variance, and the regression anatomy estimate of β will be noisy.

This is why adding many controls can decrease precision: we are using a smaller and smaller fraction of the variation in D.

4.2 A Numerical Example

Suppose we have 5 observations:

i	D_i	X_i	Y_i
1	1	1	5
2	2	3	7
3	3	2	10
4	4	4	12
5	5	5	14

Here D is years of education, X is a control (e.g., parental education), and Y is earnings. Education and parental education are correlated but not perfectly so.

Step 1: Regress Dᵢ on Xᵢ. With D̄ = 3, X̄ = 3, Cov(D,X) = 1.8, Var(X) = 2.0, the slope is b̂ = 0.9 and intercept â = 0.3. So D̂ᵢ = 0.3 + 0.9Xᵢ.

The partialled residuals D̃ᵢ = Dᵢ − D̂ᵢ are: ‍

i	D_i	D^_i	D̃_i
1	1	1.2	−0.2
2	2	3.0	−1.0
3	3	2.1	+0.9
4	4	3.9	+0.1
5	5	4.8	+0.2

These residuals are the part of education not explained by parental education. Units 3 and 5, for instance, have more education than their parental background would predict (D̃ > 0); unit 2 has less (D̃ < 0).

Step 2: Similarly regress Yᵢ on Xᵢ to get Ỹᵢ.

Step 3: The slope of Ỹᵢ on D̃ᵢ gives exactly the same β̂ as the full long regression of Y on D and X. The residuals D̃ᵢ capture the variation in D that is "as good as random" with respect to X— and this is the only variation the regression uses to identify β.

If parental education (X) is the only confounder, this partialled-out variation is uncorrelated with the error, and β̂ identifies the causal return to education. If there are other confounders not in X (e.g., unobserved ability), then even D̃ is contaminated and the estimate remains biased.

4.3 Controls Reduce (but Don't Eliminate) Bias

If X contains all the confounders between D and Y, then D̃ᵢ is uncorrelated with the error εᵢ in the true model, and the anatomy coefficient identifies the causal effect β. This is the selection-on-observables or conditional independence assumption: conditional on X, treatment D is as good as randomly assigned.

But if some confounders are not in X (the unobservables), then D̃ᵢ is still correlated with the omitted variables, and the anatomy coefficient remains biased. Adding observed controls does not solve the problem of unobserved confounders.

5 Fixed Effects as a Special Case

One of the most common uses of FWL in applied economics is the interpretation of fixed effects regression. Suppose we run the panel regression:

Y_it =

∑ k≠−1

β_k1[t − G_i = k] + α_i + λ_t + ε_it,

with unit fixed effects αᵢ and time fixed effects λₜ. By FWL, β̂ is the coefficient from regressing the within-unit, within-time residuals of Yᵢₜ on the within-unit, within-time residuals of Dᵢₜ.

This means the fixed effects estimator only uses variation in Dᵢₜ that is neither a unit-level constant nor a time-level constant— it uses the "within" variation. Cross-sectional variation (which unit is treated on average) and macro time variation (all units becoming more treated over time) are absorbed by the fixed effects and do not contribute to β̂.

6 Event Studies and Coefficient Interpretation

The anatomy formula also clarifies event study regressions. In:

Y_it =

∑ k≠−1

β_k1[t − G_i = k] + α_i + λ_t + ε_it,

each βₖ is identified from the within-unit, within-time variation in the relative-time dummy 1[t − Gᵢ = k]. Units that are never treated contribute to estimating the fixed effects but not to β̂ₖ for treated units' leads and lags.

7 Common Mistakes Clarified by FWL

(1) "Including more controls always improves the estimate." Not necessarily: if controls absorb variation in D that is exogenous, adding them reduces precision without eliminating bias from other sources.(2) "Adding a control for a mediator removes confounding." By the anatomy formula, adding a mediator to X removes the variation in D that acts through the mediator— this biases the estimate of the total effect, not just the direct effect.(3) "R² rising when controls are added means the estimate is more accurate." R² is a fit statistic. A high R² with many controls can coexist with severe omitted variable bias from uncontrolled confounders.

8 Where to Learn More

The regression anatomy formula is presented accessibly in Angrist and Pischke [2009], Chapter 3. The original FWL result is in Frisch and Waugh [1933] and Lovell [1963]. For the panel fixed effects application, see any graduate econometrics textbook, such as Wooldridge [2010].

References

Angrist, J. D. and Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion. Princeton University Press.

Frisch, R. and Waugh, F. V. (1933). Partial time regressions as compared with individual trends. Econometrica, 1(4), 387-401.

Lovell, M. C. (1963). Seasonal adjustment of economic time series and multiple regression analysis. Journal of the American Statistical Association, 58(304), 993-1010.

Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data, 2nd ed. MIT Press.

The Regression Anatomy Formula: What Multiple Regression Really Estimates

1 The Puzzle of Multiple Regression

2 The Setup

3 The Frisch-Waugh-Lovell Theorem

4 What Does "Controlling for X" Really Mean?

4.1 Controls Remove Variation

4.2 A Numerical Example

4.3 Controls Reduce (but Don't Eliminate) Bias

5 Fixed Effects as a Special Case

6 Event Studies and Coefficient Interpretation

7 Common Mistakes Clarified by FWL

8 Where to Learn More

References

‍

Continue Reading

The ivmte Package in R: Marginal Treatment Effects and Bounding Policy-Relevant Parameters

The contdid Package in R: Estimating Dose-Response Functions with Continuous Treatments

Recent Results: Housing Markets, Rent Control, and Urban Economics

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Article Title

The Regression Anatomy Formula: What Multiple Regression Really Estimates

1 The Puzzle of Multiple Regression

2 The Setup

3 The Frisch-Waugh-Lovell Theorem

4 What Does "Controlling for X" Really Mean?

4.1 Controls Remove Variation

4.2 A Numerical Example

4.3 Controls Reduce (but Don't Eliminate) Bias

5 Fixed Effects as a Special Case

6 Event Studies and Coefficient Interpretation

7 Common Mistakes Clarified by FWL

8 Where to Learn More

References

‍

Continue Reading

The ivmte Package in R: Marginal Treatment Effects and Bounding Policy-Relevant Parameters

The contdid Package in R: Estimating Dose-Response Functions with Continuous Treatments

Recent Results: Housing Markets, Rent Control, and Urban Economics

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Stay current with causal inference

Article Title