The Causal Review

1 The Problem: When Treatment Is Not Random

Imagine you want to know whether going to university causes higher wages. The obvious approach is to compare the wages of university graduates to non-graduates. But this com- parison is misleading. People who attend university tend to be smarter, more motivated, from wealthier families all factors that would raise wages even without the degree. When we say that education "causes" higher wages, we need to separate the effect of the degree itself from the pre-existing differences between those who get one and those who do not. This is the selection bias problem. OLS regression can control for some observed charac- teristics, but unmeasured factors ability, ambition, family connections remain in the error term, correlating with both treatment (university) and outcome (wages), biasing our estimate. Instrumental variables (IV) offer a way out. The central idea is elegantly simple: find something that influences whether someone gets the treatment, but does not directly affect the outcome.

2 What Is an Instrument?

Formally, a variable $Z_{i}$ is a valid instrument for treatment $D_{i}$ in the model $Y_{i}=\alpha+\tau D_{i}+\epsilon_{i}$ if: (1) Relevance: $Z_{i}$ is correlated with $D_{i}$ it actually influences whether someone gets treated. (2) Exclusion: $Z_{i}$ affects $Y_{i}$ only through $D_{i}$ it has no direct effect on the outcome other than through treatment. (3) Independence: $Z_{i}$ is "as good as randomly assigned" uncorrelated with unobserved confounders. Think of it as a lever that moves treatment without touching the outcome directly.

The college proximity example. Card [1995] used distance to the nearest college as an instrument for educational attainment. The logic:

Relevance: People who grew up near a college are more likely to attend (lower cost, easier access). ✓

Exclusion: Does distance to college directly affect wages, other than through educa- tion? Arguably not living near a college as a child does not independently make you more productive. (Though this is debated!)

Independence: Distance to college is determined by where your parents happened to live arguably not related to your unobserved ability.

3 The IV Estimator

Given a valid instrument $Z_{i}$ (here, binary: 1 if near college, 0 if not), the IV estimate of the effect of education on wages is: $\hat{\tau}^{IV}=\frac{Cov(Z_{i},Y_{i})}{Cov(Z_{i},D_{i})}$ (1)

What this formula says, intuitively: take the relationship between the instrument and the outcome (numerator), and divide by the relationship between the instrument and the treat- ment (denominator). We are essentially asking: "Among people moved by the instrument, how much does treatment change? And how much does the outcome change?" Dividing outcome change by treatment change gives the effect per unit of treatment.

In practice, with additional controls $X_{i}$, we implement IV via two-stage least squares (2SLS):

First stage: Regress $D_{i}$ on $Z_{i}$ (and controls $X_{i}$). Obtain fitted values $\hat{D}_{i}$.

Second stage: Regress $Y_{i}$ on $\hat{D}_{i}$ (and $X_{i}$). The coefficient on $\hat{D}_{i}$ is $\hat{\tau}^{2SLS}$

The first stage "purifies" $D_{i}$ by keeping only the variation driven by the instrument variation that is, by assumption, exogenous.

4 But Who Does the Instrument Affect?

The LATE Theorem

Here is the key insight that took econometricians years to fully appreciate: IV does not estimate the effect for everyone in the sample. It estimates the effect for a specific sub-group - the compliers.

4.1 The Four Types of People

With a binary instrument $Z_{i}\in\{0,1\}$ and binary treatment $D_{i}\in\{0,1\}$, each person falls into one of four categories based on how they would behave under each value of the instrument:

Table 1: The four principal strata in IV analysis
Type	D_i(0)	D_i(1)	Interpretation
Always-taker	1	1	Gets treated no matter what
Never-taker	0	0	Never gets treated
Complier	0	1	Follows the instrument
Defier	1	0	Does the opposite (ruled out)

Where $D_{i}(z)$ denotes the treatment status individual i would take when the instrument takes value .

In the college proximity example:

Always-takers would go to university regardless of proximity (wealthy, highly moti- vated students).

Never-takers would not go regardless (those with no interest in higher education).

Compliers go to university precisely because they live near one and would not have otherwise.

Defiers are implausible here (no one goes to university because they live far from one).

4.2 The LATE

The IV estimator identifies the treatment effect only for compliers the Local Average Treatment Effect (LATE): $\tau^{LATE}=\mathbb{E}[Y_{i}(1)-Y_{i}(0)|complier]$. (2)

This result, proved by Imbens and Angrist [1994] and Angrist et al. [1996], is both liberating and sobering:

Liberating: IV gives us a real causal effect for compliers.

Sobering: It may not generalise. The average effect for compliers need not equal the average effect for always-takers, never-takers, or the full population.

5 A Numerical Example

Suppose we have a small dataset (Table 2) of 1,000 individuals. The instrument $Z_{i}$ indicates living near a college.

Table 2: Hypothetical summary statistics for IV example
Group	N	Fraction with degree (D_i)	Mean wage (Y_i)
Z_i = 0 (far from college)	500	0.30	$35,000
Z_i = 1 (near college)	500	0.50	$40,000
Difference		0.20	$5,000

The IV estimate is:

$$ \hat{\tau}^{\text{IV}} = \frac{\$40,000 - \$35,000}{0.50 - 0.30} = \frac{\$5,000}{0.20} = \$25,000. $$

(3)

This says: among the 20% of people moved by proximity from no-degree to degree (the compliers), getting a degree raises wages by $25,000. Compare this to the naive OLS comparison of degree-holders vs. non-holders, which might give a much smaller or larger number depending on the direction of selection bias.

6 Common Mistakes

Weak instruments. If the first-stage relationship is weak (the instrument barely moves treatment), IV estimates become unstable and biased. Always check the first-stage F-statistic. As a rule of thumb, $F<10$ is a red flag [Staiger and Stock, 1997].
Treating LATE as ATE. The LATE is the effect for compliers. If you are trying to inform a policy that would affect all types the LATE may not be what you need. not just those on the margin
Assuming exclusion is obvious. The exclusion restriction is never guaranteed. Think hard about whether your instrument could affect the outcome through other channels. Col- lege proximity might affect wages not through education but through networks or local labour markets.
Ignoring the complier population. Since you never observe who is a complier, it takes extra work to characterise them. Angrist and Pischke [2009] show how to estimate the share of compliers and their characteristics (relative to always-takers and never-takers), providing important context for the LATE.

7 Where to Learn More

Angrist and Pischke [2009] - Mostly Harmless Econometrics: the applied econometrics bible. Chapters 4 and 5 cover IV and LATE.

Imbens and Rubin [2015] Causal Inference for Statistics, Social, and Biomedical Sciences: rigorous treatment with a potential outcomes perspective.

Angrist et al. [1996] the original LATE theorem paper: shows exactly what IV identifies under heterogeneous effects.

8 Conclusion

Instrumental variables solve one of the hardest problems in empirical research: how to recover a causal effect when treatment is not randomly assigned. The LATE theorem tells us precisely what IV identifies the treatment effect for compliers, the sub-population whose treatment status is determined by the instrument. This is often exactly the policy-relevant group: those at the margin of treatment who could be affected by interventions that change the cost or availability of treatment. Understanding IV well means understanding both its power and its limits.

References

Angrist, J. D., Imbens, G. W., and Rubin, D. B. (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association, 91(434):444-455.
Angrist, J. D. and Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion. Princeton University Press, Princeton, NJ.
Card, D. (1995). Using geographic variation in college proximity to estimate the return to schooling. In Christofides, L., Grant, E., and Swidinsky, R., editors, Aspects of Labour Market Behaviour. University of Toronto Press, Toronto.
Imbens, G. W. and Angrist, J. D. (1994). Identification and estimation of local average treatment effects. Econometrica, 62(2):467-475.
Imbens, G. W. and Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomed- ical Sciences: An Introduction. Cambridge University Press, Cambridge.
Staiger, D. and Stock, J. H. (1997). Instrumental variables regression with weak instruments. Econometrica, 65(3):557-586.

Instrumental Variables from Scratch: The LATE Theorem and Complier Analysis

1 The Problem: When Treatment Is Not Random

2 What Is an Instrument?

3 The IV Estimator

4 But Who Does the Instrument Affect?

4.1 The Four Types of People

4.2 The LATE

5 A Numerical Example

6 Common Mistakes

7 Where to Learn More

8 Conclusion

References

Continue Reading

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Article Title

Instrumental Variables from Scratch: The LATE Theorem and Complier Analysis

1 The Problem: When Treatment Is Not Random

2 What Is an Instrument?

3 The IV Estimator

4 But Who Does the Instrument Affect?

4.1 The Four Types of People

4.2 The LATE

5 A Numerical Example

6 Common Mistakes

7 Where to Learn More

8 Conclusion

References

Continue Reading

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Stay current with causal inference

Article Title