The Causal Review

1 The Issue

Difference-in-differences (DiD) requires that, absent treatment, treated and control units would have followed the same trend in the outcome variable. This is the parallel trends assumption. When the unconditional version seems implausible because treated and control units differ on observable characteristics that have their own trends researchers routinely add covariates to the DiD regression, claiming that parallel trends holds conditional on those covariates.

Conditional parallel trends is now standard practice. But it carries its own hidden assumption: that the covariates included in the regression are sufficient to make the comparison credible. A recent strand of work, spearheaded by Ghanem et al. [2022], argues that conditional DiD has a selection problem that is rarely acknowledged: conditioning on post-treatment covariates—or even pre-treatment covariates that are themselves affected by anticipation of treatment—can actually introduce bias rather than remove it. This article lays out both sides of the debate: the case for conditional DiD as a standard tool, and the case for caution raised by the new testing literature.

2 The Case for Conditional DiD

2.1 When Unconditional Parallel Trends Fails

The standard justification for conditioning on covariates is simple. Suppose treated firms are larger than control firms, and larger firms have systematically different outcome trends. Then unconditional parallel trends fails: even absent treatment, treated and control firms' outcomes would diverge.

Adding firm-size controls to the DiD regression—either as covariates in an interacted regression or as additional control variables—adjusts for the differential trend that is attributable to the size difference. Under the assumption that conditional on size, treated and control firms would have parallel trends, the adjusted DiD estimator recovers the average treatment effect on the treated (ATT). Callaway and Sant'Anna [2021] formalise the conditional parallel trends approach in the staggered adoption setting, and show that their doubly robust estimator remains consistent if either the outcome regression or the propensity score model is correctly specified. The flexibility of the doubly robust approach—which combines a flexible regression model with inverse probability weighting—means the approach is robust to some misspecification.

2.2 The Conditional Parallel Trends Assumption

Formally, conditional DiD requires:

E[Y_t(0) - Y_t-1(0) | X, D = 1] = E[Y_t(0) - Y_t-1(0) | X, D = 0]

(1)

for all X in the common support. Given this condition and the overlap condition (See HTML Embed 2 below), the ATT is identified.

The empirical practice of including pre-treatment covariates is broadly defensible when:

Covariates are measured before treatment and cannot be affected by treatment or its anticipation.

The researcher includes covariates that are genuinely predictive of the outcome trend, not just mechanically correlated with the outcome level.

The overlap condition is plausible—there are control units at all values of X found among treated units.

3 The Case for Caution: Selection on Parallel Trends

3.1 The Ghanem-Sant'Anna-Wüthrich Framework

Ghanem et al. [2022] develop a framework for testing the plausibility of conditional parallel trends, noting that the assumption is not directly testable from the data but that it can be falsified in certain settings. Their key contribution is to clarify when conditioning on covariates helps versus when it can harm identification.

The central concern is bad controls: covariates that are either (1) themselves affected by the treatment (or anticipation of it), or (2) common effects of treatment and unobserved confounders. In both cases, conditioning on such variables can create or amplify bias, in the same way that conditioning on a collider in a DAG opens a spurious path.

Example: anticipation effects. Suppose a firm anticipates being regulated next year. It may reduce investment this year in response to the anticipated regulatory burden. If the researcher includes lagged investment as a covariate, she is conditioning on a variable that was already affected by treatment—a bad control. The conditional DiD estimate will be biased even though the regression appears well-specified.

Example: collider bias. Suppose treatment status and a firm's unobserved productivity growth are both causes of a covariate (say, merger activity). Conditioning on merger activity opens the collider path, making treatment status and unobserved productivity growth correlated—destroying identification.

3.2 A Testable Implication

One contribution of Ghanem et al. [2022] is to show that, under mild conditions, the conditional parallel trends assumption implies a testable restriction: the coefficients on treatment indicators in a pre-trend regression (event study) should be jointly zero, even after conditioning on the covariates. This is a stronger test than the standard pre-trend test, because it checks not just the unconditional pre-trend but also whether conditioning on the covariates moves the pre-trend toward or away from zero. When the covariate-adjusted pre-trend is worse than the unadjusted pre-trend—a counterintuitive result that can occur when the covariates are bad controls—the researcher should question whether the conditioning set is appropriate.

4 The Overlap Condition: An Underappreciated Failure Mode

A second concern raised in the conditional DiD literature is that the overlap condition—requiring 0 < P(D = 1|X) < 1 for all X— is often violated in practice and is rarely checked. When treated and control units have non-overlapping covariate distributions, conditional parallel trends cannot be nonparametrically identified: the comparison is extrapolating out of support. In practice, many DiD papers include control variables without checking overlap, relying on functional form assumptions (usually linearity) to extrapolate. If the true conditional outcome trend is nonlinear, this extrapolation can produce substantial bias.

Callaway and Sant'Anna [2021] address this by restricting the ATT estimation to the common support, trimming units whose propensity score is outside a range where overlap holds. This increases transparency but may reduce the sample and require careful communication about the target population.

5 Finding Middle Ground

The debate between proponents of conditional DiD and the critics who emphasise selection and overlap is not a binary one. The following principles represent a reasonable synthesis:

Pre-specify the covariate set. Avoid adding or removing covariates based on whether they improve the pre-trend. Covariates should be chosen on substantive grounds before looking at the results. ‍
Check for bad controls. Use a DAG or economic reasoning to identify which covariates are safe (pre-determined, not affected by treatment or anticipation) and which are potentially bad (post-treatment or colliders). Include only safe covariates. ‍
Check overlap. Plot propensity score distributions for treated and control groups. If they do not overlap substantially, the conditional DiD estimate is extrapolating and should be reported with caution. ‍
Test the conditional pre-trend. Run the event study with and without covariates. If adding covariates makes the pre-trend look worse, investigate why before proceeding. ‍
Report unconditional and conditional estimates. If they diverge, explain why—which interpretation is more credible depends on the research design.

6 What Evidence Would Resolve the Debate?

The most informative evidence would come from within-study comparisons: settings where a randomised trial and a conditional DiD study are run on the same population. If the conditional DiD estimate systematically matches the experimental benchmark, confidence in the approach increases; if it does not, the failure modes highlighted by Ghanem et al. [2022] are empirically relevant.

Such comparisons are rare, but the available evidence is sobering. LaLonde [1986] compare experimental and non-experimental estimates for the National JTPA Study, finding that observational estimators—including DiD—can diverge substantially from the experimental benchmark. More recent within-study comparisons for specific programmes have been more encouraging, suggesting that the performance of conditional DiD depends heavily on context and covariate quality.

7 Conclusion

Conditional DiD is a powerful and widely used identification strategy, but it is not automatically safe. The selection-on-parallel-trends literature has clarified that conditioning on the wrong covariates can introduce bias rather than remove it, and that overlap failures can lead to dependence on functional form assumptions. These concerns are not arguments against conditional DiD in general—they are arguments for more careful implementation and more transparent reporting. The field is moving toward a set of best practices: pre-specified covariate sets, overlap checks, DAG-based covariate screening, and heterogeneity-robust estimators that make fewer assumptions about the functional form of the conditioning. Whether these practices become standard will shape the credibility of the next generation of DiD papers.

References

Callaway, B. and Sant'Anna, P. H. C. Difference-in-differences with multiple time periods. Journal of Econometrics, 225(2):200-230, 2021. ‍
Ghanem, D., Sant'Anna, P. H. C., and Wüthrich, K. Selection and parallel trends. Working paper, 2022. ‍
LaLonde, R. J. Evaluating the econometric evaluations of training programs with experimental data. American Economic Review, 76(4):604-620, 1986. ‍
Rambachan, A. and Roth, J. A more credible approach to parallel trends. Review of Economic Studies, 90(5):2555-2591, 2023. ‍
Sant'Anna, P. H. C. and Zhao, J. Doubly robust difference-in-differences estimators. Journal of Econometrics, 219(1):101-122, 2020.

Selection on Parallel Trends: The Hidden Assumption Behind Conditional DiD

1 The Issue

2 The Case for Conditional DiD

2.1 When Unconditional Parallel Trends Fails

2.2 The Conditional Parallel Trends Assumption

3 The Case for Caution: Selection on Parallel Trends

3.1 The Ghanem-Sant'Anna-Wüthrich Framework

3.2 A Testable Implication

4 The Overlap Condition: An Underappreciated Failure Mode

5 Finding Middle Ground

6 What Evidence Would Resolve the Debate?

7 Conclusion

References

‍

Continue Reading

The ivmte Package in R: Marginal Treatment Effects and Bounding Policy-Relevant Parameters

The contdid Package in R: Estimating Dose-Response Functions with Continuous Treatments

Recent Results: Housing Markets, Rent Control, and Urban Economics

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Article Title

Selection on Parallel Trends: The Hidden Assumption Behind Conditional DiD

1 The Issue

2 The Case for Conditional DiD

2.1 When Unconditional Parallel Trends Fails

2.2 The Conditional Parallel Trends Assumption

3 The Case for Caution: Selection on Parallel Trends

3.1 The Ghanem-Sant'Anna-Wüthrich Framework

3.2 A Testable Implication

4 The Overlap Condition: An Underappreciated Failure Mode

5 Finding Middle Ground

6 What Evidence Would Resolve the Debate?

7 Conclusion

References

‍

Continue Reading

The ivmte Package in R: Marginal Treatment Effects and Bounding Policy-Relevant Parameters

The contdid Package in R: Estimating Dose-Response Functions with Continuous Treatments

Recent Results: Housing Markets, Rent Control, and Urban Economics

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Stay current with causal inference

Article Title