The Causal Review

The Problem Stated Precisely

Let Y_it(0) denote the potential untreated outcome for unit i at time t. The parallel trends assumption states:

$$ \mathbb{E}[Y_{it}(0) - Y_{i,t-1}(0) \mid G_i = g] = \mathbb{E}[Y_{it}(0) - Y_{i,t-1}(0) \mid G_i = \infty] $$

for all t and cohort g. That is, the trend in untreated potential outcomes is the same for the treated cohort and the never-treated group.

The crucial word is "untreated." After treatment begins, we observe Y_it(g) fortreated units, not Y_it(0). Parallel trends makes a claim about Y_it(0) in the post-treatment period, which is inherently counterfactual and unobservable. This is not merely a technical limitation — it is a logical one. No amount of post-treatment data can directly test whether the counterfactual trend was parallel.

The Case That Parallel Trends Is Untestable Dogma

The Logical Point

The most fundamental objection to treating parallel trends as falsifiable is logical. The assumption makes a claim about a counterfactual world. The pre-treatment data, where outcomes for treated units under no treatment are observed, can provide evidence about pre-period trends, but not about post-period counterfactual trends. A group that was trending in parallel before treatment could diverge after treatment for reasons entirely unrelated to the treatment — a region that tracked national employment trends in the 1990s might diverge from national trends in the 2000s due to a local industry shock.

Pre-Trend Tests Are Necessary but Not Sufficient

The standard diagnostic for parallel trends is a pre-trend test: estimate treatment effects for pre-treatment periods (where the true effect is known to be zero under no-anticipation) and check whether they are statistically distinguishable from zero. If pre-period estimates are close to zero, the researcher takes this as evidence that parallel trends holds.

But pre-trend tests have fundamental limitations. Roth(2022) formalises the problem: pre-trend tests are tests of a necessary condition for parallel trends (no differential pre-trends), not a sufficient condition. Failure to reject zero pre-trends does not imply parallel trends holds in the post-period. Moreover, standard pre-trend tests are underpowered in typical applications: Roth(2022) shows that common specifications have low power to detect trend violations of the magnitude that would materially bias DiD estimates.

Perhaps most damaging: pre-trend tests are subject to the same inference issues as the main estimates. If you condition your sample or model selection on passing a pre-trend test, you introduce selective reporting bias. Researchers who fail the test revise their approach; those who pass report the result. The unconditional distribution of published DiD estimates is therefore biased toward cases where the pre-trend test was passed, which is not the same as cases where parallel trends holds.

The Threat of Differential Seasonality and Local Shocks

In many applications, treated and control units are selected because they differ in ways that are correlated with the outcome. A policy that expands health insurance in some states but not others likely expanded first in states that were already on a different health trajectory. Differential trends may be driven by differential responses to macroeconomic conditions, differential seasonality, or differential exposure to concurrent policies — none of which would necessarily show up in a pre-trend test over a short pre-period.

The Case That Parallel Trends Can Be Empirically Disciplined

Pre-Trends Provide Genuine Evidence

While pre-trend tests are not sufficient for identifying the post-period counterfactual trend, they are not without evidential value. A long and clean pre-treatment period with no discernible differential trend provides genuine evidence that the two groups have similar dynamics. The longer and more stable the pre-period, the harder it is to tell a story in which trends diverge dramatically immediately after treatment for non-treatment reasons.

Moreover, researchers can choose comparison groups specifically because they track the treatment group's pre-period outcome closely. Border discontinuity designs (Dube et al.(2010)), synthetic control methods (Abadie et al.(2010)), and matching on pre-period trends all select comparison groups partly on the basis of pre-period parallel movement. While this does not guarantee post-period parallelism, it substantially reduces the range of plausible violations.

Placebo Tests on Outcomes and Times

Beyond pre-trend tests, researchers can conduct placebo tests that provide additional evidence:

Placebo outcomes: Apply the DiD estimator to an outcome that the treatment should not affect. If the treatment raises wages, it should not raise divorce rates. A significant DiD estimate on a theoretically unaffected outcome suggests a confounded comparison group.
Placebo timing: Pretend the treatment occurred at an earlier date and re-run the DiD. If the estimate is significant for the false treatment date, it suggests a trend break in the pre-period that would bias the main estimate.
Leave-one-out permutations: Sequentially remove individual units from the control group and check whether estimates are sensitive to the composition of the control group.

None of these tests directly tests parallel trends in the post-period, but together they can substantially raise or lower the credibility of the assumption.

Honest Sensitivity Analysis

Rambachan and Roth(2023) propose a formal framework that sidesteps the testing problem. Rather than asking whether parallel trends holds, the researcher asks: how large would the violation have to be to overturn my conclusion? They parameterise violations of parallel trends by ¯M — the maximum amount by which post-period trend deviations could exceed pre-period trend deviations, expressed as a multiple of the largest pre-period departure. They then compute confidence sets for the ATT that are valid for all trend violation magnitudes up to ¯M.

This approach turns the untestability of parallel trends into a feature rather than a bug: instead of claiming that an untestable assumption holds, the researcher reports how fragile or robust the conclusion is to assumption violations. A result that is significant for¯M in [0, 2] is much more credible than one that is significant only for ¯M = 0 .

Roth et al.(2023) situate this approach within the broader modern DiD literature and argue that honest sensitivity analysis should be standard practice rather than optional robustness checks.

A Synthesis

Both sides of this debate capture something real. The logical point — that parallel trends is untestable in the strict sense — is correct and important. Researchers should never claim that a pre-trend test proves that parallel trends holds. But the claim that parallel trends is merely dogma goes too far. The assumption can be made more or less plausible by the choice of comparison group, the length and stability of the pre-period, and the battery of auxiliary tests that can rule out specific violations.

The most defensible position is that parallel trends should be treated as a working hypothesis whose plausibility is assessed empirically and whose implications are explored through sensitivity analysis. The question to ask is not "does parallel trends hold?" but "for what range of violations would my conclusion survive?" A conclusion that survives a wide range of violations is credible; one that rests entirely on exact parallel trends in a single comparison group is fragile.

This is, in many ways, the position that the modern methodology literature has converged on. The Rambachan and Roth(2023) framework, combined with careful pre-period analysis and transparent comparison group selection, allows researchers to make honest, credible claims from DiD designs without pretending that the identifying assumption is verified.

Conclusion

The parallel trends assumption is strictly untestable, in the sense that the post-period counterfactual trend for treated units is never observed. Pre-trend tests are valuable but insufficient: they test a necessary condition, not the assumption itself. However, the assumption is not beyond empirical discipline. Careful comparison group selection, long pre-periods, placebo tests, and honest sensitivity analysis can jointly make the assumption more or less credible. The appropriate response to untestability is not to abandon DiD but to be honest about fragility — to report results conditional on different degrees of assumption violation and let readers judge whether the evidence is convincing.

References

Abadie, A., Diamond, A., and Hainmueller, J. (2010). Synthetic control methods for comparative case studies: Estimating the effect of California's tobacco control program. Journal of the American Statistical Association, 105(490):493--505.
Callaway, B. and Sant'Anna, P. H. C. (2021). Difference-in-differences with multiple time periods. Journal of Econometrics, 225(2):200--230.
Dube, A., Lester, T. W., and Reich, M. (2010). Minimum wage effects across state borders: Estimates using contiguous counties. Review of Economics and Statistics, 92(4):945--964.
Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. Journal of Econometrics, 225(2):254--277.
Rambachan, A. and Roth, J. (2023). A more credible approach to parallel trends. Review of Economic Studies, 90(5):2555--2591.
Roth, J. (2022). Pretest with caution: Event-study estimates after testing for parallel trends. American Economic Review: Insights, 4(3):305--322.
Roth, J., Sant'Anna, P. H. C., Bilinski, A., and Poe, J. (2023). What's trending in difference-in-differences? A synthesis of the recent econometrics literature. Journal of Econometrics, 235(2):2218--2244.
Angrist, J. D. and Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion. Princeton University Press, Princeton, NJ.

The Parallel Trends Assumption: Testable, Credible, or Dogma?

The Problem Stated Precisely

The Case That Parallel Trends Is Untestable Dogma

The Logical Point

Pre-Trend Tests Are Necessary but Not Sufficient

The Threat of Differential Seasonality and Local Shocks

The Case That Parallel Trends Can Be Empirically Disciplined

Pre-Trends Provide Genuine Evidence

Placebo Tests on Outcomes and Times

Honest Sensitivity Analysis

A Synthesis

Conclusion

References

Continue Reading

The ivmte Package in R: Marginal Treatment Effects and Bounding Policy-Relevant Parameters

The contdid Package in R: Estimating Dose-Response Functions with Continuous Treatments

Recent Results: Housing Markets, Rent Control, and Urban Economics

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Article Title

The Parallel Trends Assumption: Testable, Credible, or Dogma?

The Problem Stated Precisely

The Case That Parallel Trends Is Untestable Dogma

The Logical Point

Pre-Trend Tests Are Necessary but Not Sufficient

The Threat of Differential Seasonality and Local Shocks

The Case That Parallel Trends Can Be Empirically Disciplined

Pre-Trends Provide Genuine Evidence

Placebo Tests on Outcomes and Times

Honest Sensitivity Analysis

A Synthesis

Conclusion

References

Continue Reading

The ivmte Package in R: Marginal Treatment Effects and Bounding Policy-Relevant Parameters

The contdid Package in R: Estimating Dose-Response Functions with Continuous Treatments

Recent Results: Housing Markets, Rent Control, and Urban Economics

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Stay current with causal inference

Article Title