The Basic Idea
Suppose the government introduces a new school lunch programme in some districts (the "treated" group) but not others (the "control" group). You want to know whether the programme improved student test scores.
A naive approach compares test scores in treated and control districts after the programme. But treated districts might have started with higher (or lower) scores. This comparison is contaminated by pre-existing differences between groups.
A better approach: compare test scores before and after the programme in treated districts. But maybe scores were improving everywhere over this period, regardless of the programme. This simple before-after comparison is contaminated by time trends.
Difference-in-differences solves both problems at once. The idea is:
- Look at how scores changed over time in the treated group.
- Look at how scores changed over time in the control group.
- The DiD estimate is the difference between these two changes.
By taking the difference of two differences, we remove both the pre-existing level difference between groups and the common time trend. What remains is (under the key assumption) the causal effect of the programme.
A Worked Numerical Example
Setup
Suppose we observe average test scores (out of 100) in two sets of school districts in two years:
| Group | Before (2019) | After (2021) | Change |
|---|---|---|---|
| Treated districts | 64 | 71 | \(+7\) |
| Control districts | 72 | 75 | \(+3\) |
| Difference-in-differences | \(7 - 3 = \mathbf{+4}\) |
Interpretation
- Treated districts improved by 7 points between 2019 and 2021.
- Control districts improved by 3 points over the same period.
- The DiD estimate is \(7 - 3 = 4\) points.
Our estimate is that the school lunch programme caused a 4-point improvement in test scores. Here is the logic:
- The 3-point increase in control districts captures the "background trend" — the improvement that would have occurred in treated districts too, in the absence of the programme.
- After removing this background trend, the 4-point excess improvement in treated districts is attributed to the programme.
Formally, the DiD estimator is: \[ \widehat{ATT} = (\bar{Y}_{T,\text{post}} - \bar{Y}_{T,\text{pre}}) - (\bar{Y}_{C,\text{post}} - \bar{Y}_{C,\text{pre}}) = (71-64) - (75-72) = 7 - 3 = 4 \]
Why Not Just Compare After-Periods?
In 2021, treated districts score 71 and control districts score 75. The simple after-period comparison gives \(71 - 75 = -4\), suggesting the programme lowered scores! This is because treated districts started with lower scores (64 vs.\ 72). The after-period comparison confounds the programme effect with the pre-existing gap.
Why Not Just Look at the Before-After Change for Treated Districts?
The 7-point increase in treated districts includes both the programme effect and any general improvement over this period (perhaps due to teacher training, economic growth, or other factors that affected all districts). The control group tells us that 3 points of improvement would have occurred anyway. DiD removes this common trend.
The Parallel Trends Assumption
The DiD estimate is only valid if the "parallel trends" assumption holds. This assumption states:
In the absence of the programme, test scores in treated districts would have followed the same trend as test scores in control districts.
In our example: if the programme had not been introduced, treated districts would have improved by 3 points (like the control group), not by 7 points. The remaining 4 points are caused by the programme.
This is an assumption — it cannot be directly tested, because we never observe what treated districts' trend would have been without the programme. But we can provide supporting evidence by checking whether the two groups were on parallel trends before the programme. If test scores in treated and control districts were moving in parallel in the years leading up to the programme, this gives us more confidence that they would have continued in parallel after the programme in the absence of treatment.
DiD as a Regression
The DiD estimator can be implemented as a linear regression. Define:
- \(\text{Treated}_i = 1\) if unit \(i\) is in the treated group, 0 otherwise.
- \(\text{Post}_t = 1\) if time period \(t\) is after the treatment, 0 otherwise.
- \(\text{Treated}_i \times \text{Post}_t\): the interaction term (1 only for treated units in the post-period).
The regression is: \[ Y_{it} = \beta_0 + \beta_1 \text{Treated}_i + \beta_2 \text{Post}_t + \beta_3 (\text{Treated}_i \times \text{Post}_t) + \varepsilon_{it} \]
The coefficient \(\beta_3\) is the DiD estimator.
Interpreting the Coefficients
- \(\beta_0\): average outcome for control group in pre-period = 72
- \(\beta_1\): level difference between treated and control groups in pre-period = \(64 - 72 = -8\)
- \(\beta_2\): time trend for control group = \(75 - 72 = 3\)
- \(\beta_3\): DiD = excess trend for treated group = \((71-64) - (75-72) = 4\)
Let us verify: the model predicts the treated group's post-period score as \(72 + (-8) + 3 + 4 = 71\). Correct.
Testing the Parallel Trends Assumption
If we have data on multiple pre-treatment periods, we can test whether the treated and control groups were on parallel trends before the treatment. We run an event-study regression: \[ Y_{it} = \alpha_i + \lambda_t + \sum_{k \neq -1} \delta_k \cdot \mathbf{1}\{t - g_i = k\} + \varepsilon_{it} \] where \(g_i\) is the treatment date and \(k\) is the number of periods relative to treatment. The coefficients \(\delta_k\) for \(k < 0\) are "pre-trend" coefficients. If parallel trends holds, these should be close to zero. A plot of all \(\delta_k\) coefficients against \(k\) is called an event-study plot.
Rambachan and Roth(2023) formalise this logic: even if pre-trend coefficients are small, post-treatment estimates can be sensitive to violations of parallel trends. They propose confidence intervals that are valid even if trends diverge by a bounded amount after treatment.
Common Pitfalls
Selecting control groups based on outcomes. If you choose control groups because they look similar to treated groups in the post-period, you have introduced bias. Control groups should be chosen based on pre-treatment characteristics and prior trends.
Violation of SUTVA. The DiD framework assumes that one unit's treatment does not affect another unit's outcomes (the "stable unit treatment value assumption"). If the lunch programme in treated schools draws students away from control schools, control schools' scores might fall, biasing the estimate.
Heterogeneous timing. If different treated units are treated at different times ("staggered adoption"), the simple 2-period DiD can be misleading. The Callaway–Sant'Anna estimator (Callaway and Sant'Anna(2021)) is designed for this case.
Small samples. DiD is often applied to aggregate data (e.g., states or districts). With few clusters, conventional standard errors are unreliable, and permutation-based inference may be needed (Angrist and Pischke(2009)).
Conclusion
Difference-in-differences is a powerful and intuitive method for estimating causal effects. Its logic — remove common time trends by differencing, remove permanent group differences by differencing again — makes it one of the cleanest quasi-experimental designs available. Its main limitation is the parallel trends assumption, which is untestable but can be supported by pre-period evidence and sensitivity analysis. When the assumption is plausible and the comparison group well-chosen, DiD provides credible causal evidence from non-experimental data.
References
- Angrist, J. D. and Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion. Princeton University Press, Princeton, NJ.
- Callaway, B. and Sant'Anna, P. H. C. (2021). Difference-in-differences with multiple time periods. Journal of Econometrics, 225(2):200--230.
- Card, D. and Krueger, A. B. (1994). Minimum wages and employment: A case study of the fast-food industry in New Jersey and Pennsylvania. American Economic Review, 84(4):772--793.
- Rambachan, A. and Roth, J. (2023). A more credible approach to parallel trends. Review of Economic Studies, 90(5):2555--2591.
- Roth, J., Sant'Anna, P. H. C., Bilinski, A., and Poe, J. (2023). What's trending in difference-in-differences? A synthesis of the recent econometrics literature. Journal of Econometrics, 235(2):2218--2244.
- Imbens, G. W. and Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences. Cambridge University Press, Cambridge.