1 The Causal Question
What is the causal effect of an additional year of schooling on wages? OLS estimates of the Mincerian returns to education are almost certainly biased upward: individuals who obtain more education likely differ in unobserved ability, family background, and motivation—all of which also raise wages. Naive regression conflates the effect of schooling with selection on unobservables.
Duflo (2001) addresses this identification problem using one of the most compelling natural experiments in development economics: Indonesia's massive school construction programme of the 1970s. The paper remains a touchstone for the IV-DiD design—an identification strategy that combines instrumental variables with a difference-in-differences framework to exploit variation across both geography and birth cohort.
2 The Setting
Between 1973 and 1978, the Indonesian government implemented the Instruksi Presiden (INPRES) programme, constructing approximately 61,000 new primary schools—the largest school construction programme in the world at that time. At its peak, new schools were being built at a rate of more than 10,000 per year.
Programme intensity was not uniform across districts. The government allocated schools in proportion to the number of school-age children who were not enrolled in primary school in 1972. Districts with large out-of-school populations received more schools per 1,000 children; those with already high enrolment received fewer. This targeting rule creates cross-district variation in programme intensity that is plausibly uncorrelated with potential outcomes, conditional on baseline enrolment rates.
The data come from the 1995 Intercensal Survey (SUPAS), which records educational attainment and wages for a large sample of Indonesian men. Duflo (2001) restricts the sample to men born between 1950 and 1972 who were 2-24 years old when the programme launched.
3 The Identification Strategy
The key insight is that exposure to the programme varies along two orthogonal dimensions:
- Cross-sectional (district) variation: districts received different numbers of new schools per 1,000 children, determined by the 1972 enrolment shortfall.
- Birth-cohort variation: individuals born in 1968 or later were aged 6-12 during the construction period (1973-1978) and could attend the new schools; those born before 1962 were too old to benefit. Individuals born between 1962 and 1967 received partial exposure.
The interaction of these two sources of variation—district programme intensity times young cohort status—instruments for individual years of schooling. Let Pᵢ denote the number of INPRES schools built per 1,000 children in individual i's district, and let Cᵢʸᵒᵘⁿᵍ = 1 if the individual was born in 1968 or later.
First stage. The causal effect of the programme on schooling:
where Xᵢ includes cohort fixed effects and district fixed effects. The parameter π captures the additional years of schooling induced by each additional school per 1,000 children, for the young cohort relative to the old cohort.
Reduced form. The programme's effect on log wages:
IV estimate. The return to schooling is identified as:
the ratio of the reduced-form wage effect to the first-stage schooling effect. This is a Wald estimator, and (3) also equals the 2SLS coefficient from instrumenting Sᵢ with (Pᵢ × Cᵢʸᵒᵘⁿᵍ) in the wage equation.
4 Key Findings
First stage. Each additional INPRES school per 1,000 children increased years of completed education by 0.124-0.258 years for the young cohort relative to the old cohort, depending on specification. The F-statistic on the excluded instrument exceeds 20 in all specifications, ruling out weak instrument concerns (Staiger and Stock, 1997).
Returns to education. The IV estimates of the return to a year of primary schooling range from 6.8 to 10.6 percent, compared with OLS estimates of approximately 7.5 percent. Two findings stand out:
- The IV estimate is not systematically above OLS. This is surprising if ability bias is the dominant source of OLS bias. It suggests that either (a) ability bias is modest for this population and margin, or (b) endogenous programme placement (schools were deliberately targeted at low-enrolment districts, which may be negatively selected on observable determinants of wages) partially offsets ability bias.
- The IV estimates identify the LATE for compliers: children who attended school specifically because a new school was built in their district. This population may have lower-than-average returns to schooling if they were on the margin of attending. The comparison of IV and OLS must account for this heterogeneity (Angrist, 1991).
Robustness. Duflo (2001) reports three key checks:
- Placebo test: estimating the same regression for older cohorts who could not have benefited from the programme shows no differential effect of programme intensity on wages. This supports the exclusion restriction.
- Heterogeneity by initial enrolment: the effects are larger in districts with lower 1972 enrolment, consistent with the programme having a larger impact where school access was more constrained.
- Alternative cohort cutoffs: results are robust to varying the age threshold that defines "young" vs. "old" cohorts.
5 Limitations
LATE and external validity. The IV estimates a local average treatment effect for children induced to attend school by the construction programme. These compliers are concentrated in high-programme-intensity districts and were at the margin of schooling before the programme. Their returns to education may differ from the average in the population, limiting the scope for policy extrapolation (Imbens and Angrist, 1994).
Exclusion restriction. The instrument (programme intensity × young cohort) must affect wages only through schooling. A threat arises if districts with high programme intensity also received other government investments in the same period (e.g., health clinics, roads, nutrition programmes). The INPRES programme included several components, and Duflo (2001) addresses this by controlling for other INPRES inputs directly and noting that wage effects emerge primarily through the education channel.
Selection of programme intensity. Programme intensity was inversely related to baseline enrolment. If low-enrolment districts were on different long-run economic trajectories, the cohort-x-programme interaction may capture differential growth rates rather than school effects. The placebo tests on older cohorts partially, but not fully, address this.
6 What We Learn
Duflo (2001) demonstrates the power of the IV-DiD design: exploiting a policy's targeting rule (variation across geographic units) in combination with birth-cohort variation in exposure timing. This design has since been applied in many other settings—Bleakley (2007)'s hookworm eradication study uses an analogous exposure-intensity × birth-cohort interaction and has become a standard template for evaluating large-scale government programmes.
The paper also illustrates a broader lesson: the comparison of IV and OLS estimates is informative even when they are similar. The absence of large IV-OLS gaps need not mean the instrument is weak; it may indicate that ability bias is smaller than commonly assumed, or that programme targeting counteracts it.
References
- Angrist, J. D. (1991). Instrumental variables estimation of average treatment effects in econometrics and epidemiology. NBER Technical Working Paper No. 115.
- Bleakley, H. (2007). Disease and development: Evidence from hookworm eradication in the American South. Quarterly Journal of Economics, 122(1):73-117.
- Card, D. (1995). Using geographic variation in college proximity to estimate the return to schooling. In Aspects of Labour Market Behaviour: Essays in Honour of John Vanderkamp, pages 201-222. University of Toronto Press.
- Duflo, E. (2001). Schooling and labor market consequences of school construction in Indonesia: Evidence from an unusual policy experiment. American Economic Review, 91(4):795-813.
- Imbens, G. W. and Angrist, J. D. (1994). Identification and estimation of local average treatment effects. Econometrica, 62(2):467-475.
- Staiger, D. and Stock, J. H. (1997). Instrumental variables regression with weak instruments. Econometrica, 65(3):557-586.