1 The Causal Question
Does reducing class size improve student achievement? The question is both practically and methodologically central to education economics. Practically, class-size reduction is expensive—hiring more teachers is one of the largest potential expenditures in any school reform budget—and policymakers need to know whether the investment pays off in learning outcomes. Methodologically, the question illustrates why causal identification is hard: schools with smaller classes often differ from those with larger classes in ways that also affect achievement (teacher quality, parental income, school resources), making simple comparisons misleading.
The landmark contribution of Angrist and Lavy [1999] resolved the identification problem by exploiting a 900-year-old rule.
2 Identification Strategy: The Maimonides Rule
The philosopher and jurist Maimonides, writing in the twelfth century, codified a rule for Jewish religious education: a class may have at most 40 students; once enrollment exceeds 40, a new class must be formed. This rule was adopted as administrative policy in Israeli public schools and governs class formation mechanically based on grade enrollment.
The Maimonides rule creates a regression discontinuity design (RDD) in class size. Let e denote grade enrollment in a school. The predicted class size under Maimonides' rule is:
where ⌊⋅⌋ denotes the floor function. This function has discontinuities at e = 41, 81, 121, ...: at enrollment 40 the predicted class size equals 40, but at enrollment 41 it drops to 41/2 = 20.5.
Sharp drops at enrollment multiples of 40 (plus 1) create regression discontinuities that Angrist and Lavy [1999] exploit for identification. The key insight is that enrollment near these cutoffs is essentially random with respect to achievement-relevant characteristics: a school with grade enrollment of 39 is almost identical to one with enrollment of 41, but the latter has classes of about 20 while the former has classes of 39. This near-randomness around the cutoff—ensured if families cannot precisely manipulate enrollment to land on the favourable side—provides the identifying variation.
3 Data and Empirical Strategy
Angrist and Lavy [1999] use administrative data from the Israeli Ministry of Education covering approximately 2,000 schools, with grade-level enrollment and standardised test scores in reading and mathematics for 4th and 5th graders in 1991.
3.1 Fuzzy RDD via 2SLS
The Maimonides rule predicts class size but does not enforce it perfectly: some schools deviate from the rule due to administrative discretion, multi-grade classrooms, or special education pulls. This makes the design fuzzy: the rule is a strong but imperfect predictor of actual class size, not a deterministic assignment. The fuzzy RDD is estimated by two-stage least squares (2SLS):
First stage:
where n̄ₛ꜀ is actual class size in class c of school s, f(eₛ) is predicted class size under Maimonides' rule, eₛ is grade enrollment, and δₛ are school fixed effects.
Second stage:
where yₛ꜀ is the average test score in class c and n̂̄ₛ꜀ is the fitted value from the first stage. The instrument f(eₛ) satisfies:
- Relevance: f(eₛ) strongly predicts actual class size (first-stage F-statistic well above conventional thresholds).
- Exclusion: Conditional on enrollment and school effects, f(eₛ) affects test scores only through class size—not directly.
- Monotonicity: Higher predicted class size increases (or at least does not decrease) actual class size for all schools.
The exclusion restriction is the key identifying assumption. It requires that schools just above an enrollment threshold (which get smaller classes) do not differ in unobserved dimensions from schools just below (which get larger classes). Angrist and Lavy [1999] test this by checking whether observable school characteristics—percent disadvantaged, mean parental education—are smooth at the enrollment thresholds. They find no discontinuities, supporting the validity of the instrument.
4 Key Findings
4.1 First-Stage Relationship
The Maimonides rule is a powerful predictor of actual class size. Each unit increase in predicted class size f(e) is associated with approximately a 0.8-unit increase in actual average class size within the school-grade. The first-stage F-statistic exceeds 30 in most specifications, ruling out weak-instrument concerns.
4.2 Effect on Test Scores
The 2SLS estimates imply that a reduction in class size of one student reduces average test scores by approximately 0.2 to 0.3 standard deviations per 10-student class size reduction, though estimates vary across subject and grade:
Table 1: Selected 2SLS estimates from Angrist and Lavy [1999]
The negative sign means larger classes reduce scores: a 10-student increase in class size is associated with a 0.19-0.28 standard deviation decrease in reading performance. These are economically meaningful effects—a 0.2 SD improvement is roughly equivalent to a few months of additional schooling.
5 Threats to Validity and Robustness
5.1 Manipulation of Enrollment
If principals can "sort" students to land above or below enrollment thresholds (e.g., holding students back to push enrollment above 40), the random-assignment logic breaks down. Angrist and Lavy [1999] address this with a version of the McCrary density test [McCrary, 2008], finding no bunching of schools just below threshold values.
5.2 Polynomial Controls for Enrollment
The running variable (enrollment) is included as a polynomial to control for smooth trends in outcomes across enrollment levels. The authors test sensitivity to the polynomial order (linear, quadratic) and find broadly similar results, though higher-order polynomials reduce precision.
5.3 Endogenous Enrollment?
A subtler concern is that a school's total enrollment may itself be a choice variable. Parents might select into schools partly based on expected class sizes, which could confound the design at a different level. Angrist and Lavy [1999] argue that in the Israeli context, school assignment is largely residence-based and not subject to substantial choice, reducing this concern.
6 Comparison with the STAR Experiment
The Tennessee STAR experiment [Krueger, 1999] randomly assigned students to small (13-17 students) or regular (22-26 students) classes in kindergarten through 3rd grade. STAR provides the cleanest causal estimate of class-size effects—a properly randomised experiment—but was costly, limited in scope, and subject to implementation problems (teacher reassignments, compliance issues). Krueger [1999] found sizeable positive effects of small classes, especially for disadvantaged and minority students, broadly consistent with Angrist and Lavy [1999].
The two studies are complementary: STAR has stronger internal validity (randomisation) but limited external validity (a specific period, age range, and US context); Angrist and Lavy [1999] rely on an observational design with a stronger exclusion restriction defence but face the standard LATE interpretation—estimates apply to schools whose class-size assignment actually changed at the threshold (compliers), not all Israeli schools.
7 Long-Run Effects and Extensions
A limitation of the original Angrist and Lavy [1999] study is that outcomes are short-run test scores. Subsequent work has explored whether class-size effects persist. Chetty et al. [2011] link the STAR experiment to tax records decades later and find that kindergarten class-size assignment (through STAR) does predict adult earnings, suggesting that short-run test score effects understate long-run benefits. Whether this finding generalises to the fuzzy RD context of Maimonides is unknown.
Extensions of the Maimonides design have been applied to other countries with similar enrollment-cap rules: France, Bolivia, and India all have class-size caps that generate analogous discontinuities [Angrist and Lavy, 2002]. Results are generally consistent in sign but vary in magnitude, reinforcing the importance of context for interpreting LATE estimates.
8 What We Learn
The Angrist-Lavy study delivers several lessons beyond its substantive finding:
- Institutional rules as instruments. Bureaucratic rules that assign treatment mechanically but imperfectly are among the most credible instruments in applied work. The Maimonides rule is an archetype.
- The fuzzy RD as 2SLS. A fuzzy discontinuity is exactly a local 2SLS problem: the threshold is the excluded instrument, first-stage compliance is the local jump in treatment probability, and the ratio of reduced-form to first-stage jumps is the LATE at the threshold.
- LATE interpretation. The 2SLS estimate identifies the average treatment effect for compliers—schools whose actual class size changed because enrollment crossed the threshold. Schools that deviate from the rule regardless (always-small or always-large) are unaffected, and the estimate does not speak to them.
- Robustness matters. The density test, covariate balance at the threshold, and polynomial sensitivity checks have become standard reporting requirements for any RD study, in part because of the discipline instilled by this paper.
9 Conclusion
Angrist and Lavy (1999) transformed a medieval scholar's rule about class size into one of the most cited natural experiments in education economics. By exploiting the discontinuous predictions of the Maimonides rule, they obtained credible estimates of a question that had resisted identification for decades. Their paper is a masterclass in using institutional knowledge to solve identification problems, in the rigorous defence of instrument validity, and in the careful interpretation of local average treatment effects. It remains required reading for any student of causal inference.
References
- Angrist, J. D. and Lavy, V. (1999). Using Maimonides' rule to estimate the effect of class size on scholastic achievement. Quarterly Journal of Economics, 114(2):533-575.
- Angrist, J. D. and Lavy, V. (2002). New evidence on classroom computers and pupil learning. Economic Journal, 112(482):735-765.
- Calonico, S., Cattaneo, M. D., and Titiunik, R. (2014). Robust nonparametric confidence intervals for regression-discontinuity designs. Econometrica, 82(6):2295-2326.
- Chetty, R., Friedman, J. N., Hilger, N., Saez, E., Schanzenbach, D. W., and Yagan, D. (2011). How does your kindergarten classroom affect your earnings? Evidence from Project STAR. Quarterly Journal of Economics, 126(4):1593-1660.
- Imbens, G. W. and Lemieux, T. (2008). Regression discontinuity designs: A guide to practice. Journal of Econometrics, 142(2):615-635.
- Krueger, A. B. (1999). Experimental estimates of education production functions. Quarterly Journal of Economics, 114(2):497-532.
- McCrary, J. (2008). Manipulation of the running variable in the regression discontinuity design: A density test. Journal of Econometrics, 142(2):698-714.