The Causal Review

1 Introduction

Does putting fewer students in a classroom improve learning? The question sounds simple. The answer proved elusive for decades. Observational studies produced conflicting results, partly because smaller classes are not randomly assigned they tend to appear in schools that are also better resourced, or alternatively in struggling schools receiving targeted inter- ventions.

The Tennessee Student/Teacher Achievement Ratio (STAR) project, launched in 1985, was designed to answer this question with a large-scale randomised controlled trial. Krueger [1999] analysed the STAR data and produced one of the most influential education stud- ies of the modern era a clean experimental benchmark that has shaped education policy worldwide.

2 The Causal Question

The estimand is the average treatment effect of small class assignment on student test scores:

$$ ATE = \mathbb{E}[Y_i^{small} - Y_i^{regular}] $$

(1)

where Yi ^small and Yi ^regular are the potential test scores for student i in a small class versus a regular class.

Secondary questions concern: (1) whether adding a full-time teaching aide to a regular class (a cheaper intervention) produces similar gains; (2) whether effects differ by race, income, or school urbanicity; and (3) whether effects persist after students return to regular classes.

3 The STAR Experiment

3.1 Design

The STAR project enrolled approximately 11,600 students in kindergarten through third grade across 79 Tennessee schools from 1985 to 1989. Within each participating school, students and teachers were randomly assigned to one of three conditions:

Small class: 13-17 students.
Regular class: 22-26 students.
Regular class with aide: 22-26 students, with a full-time teaching aide.

Random assignment occurred within schools, so the identifying variation is within-school. Schools themselves were not randomly selected they volunteered but conditional on being in a school, assignment to class type was random.

3.2 Outcomes

The primary outcome is the Stanford Achievement Test (SAT) score, administered each spring in reading and mathematics. Scores are converted to percentile equivalents for com- parability.

3.3 Randomisation Checks

A key concern with any RCT is whether randomisation was successfully implemented. Krueger [1999] reports balance checks showing that students in small and regular classes are similar on pre-randomisation characteristics: race, gender, free lunch eligibility, and (in a small subsample) measured pre-kindergarten ability. The balance is reassuring.

Some attrition occurred: not all students remained in the same class type throughout, and some transferred schools. Krueger [1999] addresses this with intent-to-treat (ITT) analysis, using initial assignment rather than actual class size attended.

4 Empirical Strategy

Krueger [1999] estimates the treatment effect via OLS with school fixed effects:

$$ Y_{ics} = \alpha_s + \beta_1 \text{Small}_{cs} + \beta_2 \text{RegAide}_{cs} + \gamma' X_{ics} + \varepsilon_{ics} $$

(2)

where Y_icsis the test score of student i in class c in school s, α_s are school fixed effects (ab- sorbing between-school variation since assignment is within-school), Small_cs and RegAide_cs are class-type dummies, and Xi_cs are student characteristics.

Standard errors are clustered at the class level, since treatment is assigned at the class level and residuals within a class are likely correlated.

5 Key Findings

5.1 Main Effect

The core finding: students assigned to small classes score approximately 4-5 percentile points higher than students in regular classes by the end of kindergarten. The effect grows slightly over the four years of the experiment.

The teaching-aide condition shows no statistically distinguishable effect from the regular class condition aides do not substitute for smaller classes.

Table 1: Estimated effects of small class assignment on test scores (percentile points)
Grade	Small vs. Regular	Regular+Aide vs. Regular
Kindergarten	4.5***	0.9
First grade	5.4***	0.6
Second grade	4.7***	1.3
Third grade	4.9***	1.1

*** p < 0.01. Source: Krueger 1999, Table IV.

5.2 Heterogeneous Effects

A key finding is that the effect of small class assignment is larger for minority students and students eligible for free lunch the most disadvantaged groups. Black students assigned to small classes score roughly 7-8 percentile points higher than Black students in regular classes, compared to 3-4 points for white students. This heterogeneity has large implications for how resources should be targeted.

5.3 Persistence

Krueger and Whitmore [2001] (with Diane Whitmore) examine what happens after the experiment ends and students return to regular classes. They find that the test score gains from small class assignment persist into fourth and fifth grades, even though students are now in classes of the same size as those who were never in small classes. The gains are not merely a short-term adjustment effect.

Long-run work by Chetty et al. [2011] finds that kindergarten class size identified through STAR has lasting effects on earnings and college attendance, providing some of the cleanest evidence on the returns to early childhood education quality.

6 Methodological Contributions

The STAR study illustrates several important features of large-scale RCTs in education:

Within-school randomisation: By randomising within schools, STAR controls for the enormous between-school variation in resources, culture, and demographics that confound observational class-size studies.
ITT vs. LATE: Some students switched class types during the experiment. The ITT analysis (using initial assignment) is conservative and avoids post-randomisation bias. A LATE estimate using initial assignment as an instrument for actual class size attended yields larger per-student effects.
Cluster-robust inference: The STAR data are a natural setting for cluster-robust standard errors; standard errors that ignore within-class correlation would be too small.

7 Critiques and Limitations

Despite its influence, STAR has attracted criticism:

• Generalisability: STAR schools volunteered to participate. They may not be repre- sentative of all Tennessee schools, let alone schools nationwide.

• Hawthorne effects: Teachers in small classes may have worked harder not because of the smaller class per se, but because they knew they were being studied.

• Implementation fidelity: Actual class sizes in STAR did not always fall within the target ranges. The ITT analysis addresses the intention to assign but not the fidelity of implementation.

• Hanushek [1999] critique: Eric Hanushek has argued that the STAR effects are modest relative to their cost, and that teacher quality dwarfs class size as a determinant of achievement. The debate over cost-effectiveness continues.

8 Policy Impact

The STAR findings influenced education policy in California, which in 1996 funded a statewide class-size reduction initiative. The California experience was more mixed than STAR predic- tions would suggest partly because the rapid hiring of additional teachers led to a decline in average teacher quality. This illustrates the general equilibrium effects that an RCT cannot capture.

9 Conclusion

The Tennessee STAR experiment remains one of the most carefully designed and widely cited education RCTs ever conducted. Its main finding that small classes improve stu- dent achievement by 4-5 percentile points on average, with larger effects for disadvantaged students has been replicated in other contexts and extended to show long-run earnings effects. The study also illustrates the challenges of large-scale field experiments: non- compliance, attrition, and the gap between experimental findings and real-world policy im- plementation.

References

Krueger, A.B. (1999). Experimental estimates of education production functions. Quarterly Journal of Economics, 114(2):497-532.
Krueger, A.B. and Whitmore, D.M. (2001). The effect of attending a small class in the early grades on college-test taking and middle school test results: Evidence from project STAR. Economic Journal, 111(468):1-28.
Chetty, R., Friedman, J.N., Hilger, N., Saez, E., Schanzenbach, D.W., and Yagan, D. (2011). How does your kindergarten classroom affect your earnings? Evidence from Project STAR. Quarterly Journal of Economics, 126(4):1593-1660.
Hanushek, E.A. (1999). Some findings from an independent investigation of the Tennessee STAR experiment and from other investigations of class size effects. Educational Evalua- tion and Policy Analysis, 21(2):143-163.
Angrist, J.D. and Lavy, V. (1999). Using Maimonides' rule to estimate the effect of class size on scholastic achievement. Quarterly Journal of Economics, 114(2):533-575.

Class Size and Student Achievement: The Tennessee STAR Experiment

1 Introduction

2 The Causal Question

3 The STAR Experiment

3.1 Design

3.2 Outcomes

3.3 Randomisation Checks

4 Empirical Strategy

5 Key Findings

5.1 Main Effect

5.2 Heterogeneous Effects

5.3 Persistence

6 Methodological Contributions

7 Critiques and Limitations

8 Policy Impact

9 Conclusion

References

Continue Reading

The ivmte Package in R: Marginal Treatment Effects and Bounding Policy-Relevant Parameters

The contdid Package in R: Estimating Dose-Response Functions with Continuous Treatments

Recent Results: Housing Markets, Rent Control, and Urban Economics

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Article Title

Class Size and Student Achievement: The Tennessee STAR Experiment

1 Introduction

2 The Causal Question

3 The STAR Experiment

3.1 Design

3.2 Outcomes

3.3 Randomisation Checks

4 Empirical Strategy

5 Key Findings

5.1 Main Effect

5.2 Heterogeneous Effects

5.3 Persistence

6 Methodological Contributions

7 Critiques and Limitations

8 Policy Impact

9 Conclusion

References

Continue Reading

The ivmte Package in R: Marginal Treatment Effects and Bounding Policy-Relevant Parameters

The contdid Package in R: Estimating Dose-Response Functions with Continuous Treatments

Recent Results: Housing Markets, Rent Control, and Urban Economics

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Stay current with causal inference

Article Title