The Causal Review

1 The Causal Question

Does having a high value-added teacher- one who raises students' test scores by more than average cause lasting improvements in students' lives? The conventional wisdom in education research, following Hanushek [1997], held that teacher effects on long-run outcomes are modest and that test scores are poor proxies for what schools ultimately produce. Chetty et al. [2014] challenge this directly. Using administrative data on 2.5 million students matched to tax records, they find that replacing a teacher at the bottom 5% of the value-added distribution with an average teacher would raise the present value of affected students lifetime earnings by approximately $250,000 per classroom.

This case study examines the identification strategy a quasi-experimental design exploiting teacher mobility across schools the data, the key findings, and the limitations of this influential paper.

2 What is Teacher Value-Added?

Value-added models (VAMs) estimate the component of student test score growth attributable to the teacher, after controlling for student and classroom characteristics. For student i assigned to teacher j in year t, a standard value-added model takes the form: ‍

A_ijt = μ_j + λA_i,t-1 + X'_itβ + ε_ijt,

(1)

where Aᵢⱼₜ is the student's test score in year t, Aᵢ,ₜ₋₁ is the lagged score, Xᵢₜ includes student demographics, classroom peer characteristics, and school fixed effects, and μⱼ is the teacher fixed effect the value-added estimate. Higher μ̂ⱼ indicates a teacher who raises scores more than predicted by the model.

The key concern with using μ̂ⱼ for causal inference is bias from non-random assignment: if high-VA teachers systematically get assigned better students, μ̂ⱼ reflects student quality rather than teacher quality. Chetty et al. [2014] develop a quasi-experimental validation that directly addresses this concern.

3 The Quasi-Experimental Validation

The identification strategy exploits variation in the cohort-average value-added of the teacher a student is assigned to in a given classroom, arising from the natural process of teacher turnover.

3.1 The "Drift" Test

When a high-VA teacher moves into a school, the average test scores of that school's students should rise, and when that teacher leaves, scores should fall if and only if teacher VA genuinely causes student performance. Chetty et al. [2014] use teacher entries and exits across schools to test this directly. The test is implemented as: ‍

A_ct = α + βVA_ct + θ_c + φ_t + u_ct,

(2)

where Ā꜀ₜ is the mean residual test score in classroom in year t, VĀ꜀ₜ is the predicted VA of the teachers assigned to that classroom (computed out of sample), θ꜀ are classroom fixed effects, and ϕₜ are year fixed effects. The coefficient β ≈ 1 would confirm that VA estimates track genuine teacher effects rather than student selection.

The key identifying assumption is that teacher mobility is uncorrelated with trends in the underlying quality of students assigned to that classroom. Chetty et al. [2014] find no pre-trend in student scores before teacher arrivals, supporting this assumption.

4 Data

The study uses administrative records from a large urban school district in the United States (New York City, though this is not stated explicitly in the paper) linked to:

School district records: Test scores in grades 3-8 for students enrolled between 1989 and 2009, covering approximately 2.5 million student-year observations.

IRS tax records: W-2 earnings, college enrolment (via 1098-T tuition forms), and teenage birth records (via tax dependents) linked by name and date of birth. Earnings are observed at age 28 for students in the sample.

Teacher assignment records: Roster data linking each student to their classroom teacher by year, essential for constructing the teacher fixed effects.

The linkage of school district records to tax data is the methodological centrepiece. Without the IRS data, the analysis could only establish effects on test scores. The administrative linkage enables a causal chain from teacher to test score to long-run earnings exactly the chain that previous research could not close.

5 Key Findings

5.1 VA Measures True Teacher Effects

The quasi-experimental validation yields a coefficient of approximately 0.97 in equation (2), close to one, confirming that VA estimates are unbiased. There is no evidence of pre-trends, and the effect on scores materialises in the year the teacher arrives. This is the paper's most important methodological contribution: it establishes, against prior scepticism, that standard VA estimates are credible.

5.2 Effects on Test Scores Predict Long-Run Outcomes

Using the quasi-experimental VA estimates, Chetty et al. [2014] trace the effect of teacher quality on outcomes at age 28:

A one standard deviation increase in teacher VA raises annual earnings at age 28 by approximately 1.3%.

It raises college attendance by 0.5 percentage points.

It reduces the teenage birth rate among female students by approximately 1.2 percentage points.

It increases the likelihood of living in a higher-income neighbourhood (as measured by the percentile rank of the neighbourhood's average income).

These are intent-to-treat effects: the average effect of being randomly assigned a teacher one standard deviation above mean VA. Scaled to the VA distribution, replacing a teacher at the 5th percentile with an average teacher would raise each student's lifetime earnings by roughly $9,000 (in 2010 dollars), or approximately $266,000 per classroom of 28 students.

5.3 Fade-out in Test Scores, Persistence in Outcomes

An important nuance is that test score effects fade: by two years after exposure to a high-VA teacher, the boost to test scores has largely dissipated. Yet the earnings effects persist to age 28. This finding challenges the standard "fade-out" critique of educational interventions, which uses test score fade-out as evidence that interventions have no lasting value. Chetty et al. [2014] argue that non-cognitive skills persistence, classroom behaviour, health habits may transmit the teacher effect to adult outcomes even as test scores revert.

6 Limitations

The paper has attracted significant methodological scrutiny, focused on several fronts:

The selection of the outcome variables. Earnings at age 28 may not fully reflect lifetime earnings. The authors acknowledge that college graduates' returns to education accrue later in life, so the earnings measure may understate or mismatch the full earnings effect.

The quasi-experimental design's scope. The drift test identifies VA causal effects for teachers who move between schools. Teachers who never move possibly the most stably assigned to advantaged or disadvantaged classrooms contribute to VA estimates but are not validated quasi-experimentally. Rothstein [2010] argues that the validation may not generalise to the full VA distribution.

The earnings linkage. The IRS linkage is probabilistic: students are matched by name and date of birth rather than a unique identifier. Match errors could attenuate the estimated long-run effects if mismatches are non-random. Chetty et al. [2014] report high match rates and conduct robustness checks, but probabilistic linkage introduces noise absent from perfect administrative linkage.

External validity. The study covers a single large urban school district. Whether the findings hold for rural or suburban districts, or for countries with different institutional structures for teacher assignment, is unknown.

7 Policy Implications and the Debate Over Value-Added

Chetty et al. [2014] conclude that VA estimates should play a role in teacher retention and compensation decisions. This policy implication has been sharply contested. Critics note that high-stakes use of VA scores creates incentives to teach to the test, manipulate classroom composition, or coach students in score-relevant dimensions while neglecting broader development [Hanushek, 1997].

The debate illustrates a general tension in causal policy analysis. Establishing that teacher VA causes better outcomes (identification) does not automatically resolve the question of how policy should respond (policy design). Structural changes-how VA is measured, how teachers are selected, how incentives are designed affect the mapping from the estimated parameter to the policy-relevant counterfactual.

8 Conclusion

Chetty et al. [2014] provides one of the most carefully identified causal studies of the long-run effects of teacher quality. The quasi-experimental design, exploiting teacher mobility as a source of plausibly exogenous variation in classroom-level VA, addresses the principal threat to validity in value-added research. The administrative data linkage to IRS earnings records closes the chain from classroom experience to adult outcomes. The core finding- that high-VA teachers produce lasting earnings and socioeconomic gains has reshaped the debate over teacher evaluation policy and the long-run effects of schooling.

References

Chetty, R., Friedman, J. N., and Rockoff, J. E. (2014). Measuring the impacts of teachers I: Evaluating bias in teacher value-added estimates. American Economic Review, 104(9):2593-2632.
Hanushek, E. A. (1997). Assessing the effects of school resources on student performance: An update. Educational Evaluation and Policy Analysis, 19(2):141-164.
Rothstein, J. (2010). Teacher quality in educational production: Tracking, decay, and student achievement. Quarterly Journal of Economics, 125(1):175-214.
Angrist, J. D. and Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion. Princeton University Press.
Krueger, A. B. (1999). Experimental estimates of education production functions. Quarterly Journal of Economics, 114(2):497-532.
Chetty, R., Friedman, J. N., Hilger, N., Saez, E., Schanzenbach, D. W., and Yagan, D. (2011). How does your kindergarten classroom affect your earnings? Evidence from Project STAR. Quarterly Journal of Economics, 126(4):1593-1660.

‍

Chetty, Friedman, and Rockoff (2014): Teacher Value-Added and Long-Run Outcomes

1 The Causal Question

2 What is Teacher Value-Added?

3 The Quasi-Experimental Validation

3.1 The "Drift" Test

4 Data

5 Key Findings

5.1 VA Measures True Teacher Effects

5.2 Effects on Test Scores Predict Long-Run Outcomes

5.3 Fade-out in Test Scores, Persistence in Outcomes

6 Limitations

7 Policy Implications and the Debate Over Value-Added

8 Conclusion

References

Continue Reading

The ivmte Package in R: Marginal Treatment Effects and Bounding Policy-Relevant Parameters

The contdid Package in R: Estimating Dose-Response Functions with Continuous Treatments

Recent Results: Housing Markets, Rent Control, and Urban Economics

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Article Title

Chetty, Friedman, and Rockoff (2014): Teacher Value-Added and Long-Run Outcomes

1 The Causal Question

2 What is Teacher Value-Added?

3 The Quasi-Experimental Validation

3.1 The "Drift" Test

4 Data

5 Key Findings

5.1 VA Measures True Teacher Effects

5.2 Effects on Test Scores Predict Long-Run Outcomes

5.3 Fade-out in Test Scores, Persistence in Outcomes

6 Limitations

7 Policy Implications and the Debate Over Value-Added

8 Conclusion

References

Continue Reading

The ivmte Package in R: Marginal Treatment Effects and Bounding Policy-Relevant Parameters

The contdid Package in R: Estimating Dose-Response Functions with Continuous Treatments

Recent Results: Housing Markets, Rent Control, and Urban Economics

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Stay current with causal inference

Article Title