1 Introduction
The canonical difference-in-differences (DiD) design compares units that receive a binary treatment to those that do not. Yet many of the most consequential policy questions involve treatments that vary continuously in intensity: how much does a one-percentage-point increase in the minimum wage affect employment? Does an additional year of education exposure shift adult wages linearly or with diminishing returns? How sensitive is mortality to a one-unit change in air-pollution concentration?
For two decades, researchers answered such questions by discretising the treatment—above-median versus below-median exposure, attainment versus non-attainment counties, high-dose versus low-dose regions—and applying standard binary DiD. The discretisation approach is simple but discards information, forces arbitrary cutoffs, and can produce estimates that mix fundamentally different causal objects depending on where the threshold is placed.
Callaway et al. [2024] provide a comprehensive framework for DiD with a continuous treatment, extending the staggered-adoption logic of Callaway and Sant'Anna [2021] to settings where the dose itself is the object of interest. Their framework introduces a careful taxonomy of treatment-effect parameters, identifies precisely what additional assumptions are needed relative to the binary case, and delivers an accompanying contdid R package. This article explains the key ideas, works through the identification assumptions, and illustrates with their running empirical application.
2 Two Objects That Collapse to One Under Binary Treatment
The first conceptual contribution of Callaway et al. [2024] is a distinction that is invisible in binary settings. Let Yᵢₜ(d) denote the potential outcome for unit i at time t if it receives dose d ∈ D ⊆ ℝ₊. Two separate objects can be defined:
Level treatment effect (LTE). Comparing dose d to a baseline dose d₀ (typically zero):
This measures the overall effect of receiving dose d rather than d₀.
Causal response (CR). The derivative of the potential outcome with respect to the dose:
This measures the marginal effect of a unit increase in the dose at level d.
When the treatment is binary (d ∈ {0, 1}), LTE(1, 0) = CR(d) for d ∈ (0, 1), so the distinction disappears. But with continuous treatment, the level effect integrates the causal response over the dose range, and the two can diverge sharply when the dose-response curve is non-linear. Mistaking one for the other leads to policy errors: a linear extrapolation of a concave response function will overstate the gains from high-dose treatment.
3 The Identification Framework
Callaway et al. [2024] consider a panel with T periods and a period t* > 1 at which units receive their (time-invariant) dose. Units with Dᵢ = 0 are untreated; units with Dᵢ = d > 0 receive dose d. The key parameter of interest is the dose-response function:
the Average Dose-Response Function evaluated at dose d relative to no treatment.
3.1 Parallel Trends for Continuous Doses
The identifying assumption generalises parallel trends, but in a way that is strictly stronger than the binary case:
Assumption PT-C (Parallel Trends for Continuous Treatment):
For all doses d in the support, E[Yₜ(0) - Yₜ₋₁(0) | Dᵢ = d] = E[Yₜ(0) - Yₜ₋₁(0) | Dᵢ = 0].
This says that the untreated potential-outcome trend for units assigned dose d would have matched the trend for the control group (Dᵢ = 0) for every value of d. This is a statement about an entire continuum of counterfactual paths, not just one comparison group—and it rules out sorting on trend if units self-select into higher doses based on expected trajectory.
An important practical point: if treatment intensity correlates with pre-treatment trends, PT-C fails. Researchers should visually inspect pre-treatment event-study plots indexed by dose level before proceeding.
3.2 No Anticipation
Parallel to the binary case, identification also requires:
Assumption NA: Yᵢₜ(d) = Yᵢₜ(0) for all t < t* and all d > 0.
Units do not adjust behavior before treatment is received.
3.3 The Identification Result
Under PT-C and NA, the ADRF is identified by:
This is exactly the standard DiD formula, but evaluated at each value of d separately. The causal response is then the derivative of the ADRF with respect to d.
4 Additional Complication: Comparisons Across Doses
Equation (4) identifies the ADRF(d) for each d relative to zero. But researchers also want to compare, say, ADRF(10) to ADRF(5)—i.e., the causal effect of receiving ten units versus five. This comparison requires:
Crucially, Callaway et al. [2024] show that this cross-dose comparison is identified only if PT-C holds jointly for all dose pairs—which is in fact implied by PT-C as stated above. No stronger assumption is needed for level treatment-effect comparisons. But for the causal response (derivative), a smoothness condition on the potential outcome function is also required.
5 Staggered Continuous Treatment
Many empirical settings feature staggered treatment adoption where both the timing and the dose vary across units. Callaway et al. [2024] extend their framework to this case. Define group g as the set of units first treated in period g with dose Dᵢ. The group-time ADRF is:
Under a staggered version of PT-C (using never-treated or not-yet-treated units as the control group), these group-time parameters are identified and can be aggregated in the same spirit as Callaway and Sant'Anna [2021]. Event-study plots can be constructed for each dose level d, showing how the treatment effect at dose d evolves across event time.
6 Estimation
Nonparametric estimation of the ADRF poses well-known curse-of-dimensionality challenges when conditioning on Dᵢ = d for a continuous variable. Callaway et al. [2024] implement a sieve (series) estimator, approximating the dose-response function with a polynomial or spline basis in d and using cross-validation to select the degree of approximation.
The authors also provide doubly-robust versions of the estimator, combining outcome-regression and inverse-probability-weighting components analogous to Sant'Anna and Zhao [2020]. Double robustness means the estimator is consistent if either (but not necessarily both) the outcome model or the propensity score model is correctly specified.
7 An Empirical Application: The Affordable Care Act Medicaid Expansion
Callaway et al. [2024] illustrate their methods using variation in Medicaid expansion rates across US counties following the Affordable Care Act. The dose is the county-level percentage increase in Medicaid enrollment (a continuous variable), and the outcome is various healthcare utilization and financial measures.
Their dose-response plots reveal non-linearities that binary DiD analyses would miss: the effect on emergency-department visits is concave in the dose, with large gains at low enrollment increases that flatten out at higher intensities. The causal response (derivative) declines significantly above a threshold enrollment increase of roughly 5 percentage points.
8 Comparison to Existing Approaches
Table 1 summarises the main alternatives to the Callaway et al. [2024] approach. The linear two-way fixed effects (TWFE) approach regressing outcomes on the dose with unit and time fixed effects is the most common practice, but de Chaisemartin and D'Haultfœuille [2020] show that with treatment effect heterogeneity, TWFE estimates a weighted average of treatment effects with potentially negative weights.
Table 1: Approaches to continuous treatment in DiD settings
9 The contdid Package
Implementation is provided by the contdid R package, available on CRAN. Key functions include:
- cont_did(): estimates the ADRF and causal response at a user-specified grid of dose values
- cont_did_staggered(): handles staggered continuous treatment adoption
- aggte_cont(): aggregates group-time ADRFs into overall summaries
- ggcont_did(): produces dose-response plots with pointwise and simultaneous confidence bands
Standard errors are obtained by the multiplier bootstrap, which is computationally efficient and provides valid inference.
10 Practical Guidance for Researchers
Several practical lessons emerge from this framework:
- Pre-trend plots by dose level. Before estimating the ADRF, plot pre-treatment trends separately for low-, medium-, and high-dose units. Divergence in pre-trends is evidence against PT-C.
- Support of the dose. The ADRF is identified only over the empirical support of the dose distribution. Report estimates only for dose values with meaningful density.
- Report both level effects and causal responses. The two objects have different policy interpretations. A policy that fixes a dose level should use the LTE; a policy that incrementally adjusts doses should use the CR.
- Sensitivity to dose definition. If the dose is measured with error, the ADRF will be attenuated. Consider IV methods for the dose if a plausible instrument is available.
11 Conclusion
The extension of DiD to continuous treatments is not merely a technical refinement—it opens the door to richer, more policy-relevant analysis. The distinction between level treatment effects and causal responses, the generalization of parallel trends to a continuum of doses, and the doubly-robust sieve estimator of Callaway et al. [2024] together provide a principled framework. Researchers who have been discretising continuous treatments out of methodological habit now have rigorous tools to do otherwise.
References
- Callaway, B., Goodman-Bacon, A., and Sant'Anna, P. H. C. (2024). Difference-in-differences with a continuous treatment. arXiv:2107.0263707.
- Callaway, B. and Sant'Anna, P. H. C. (2021). Difference-in-differences with multiple time periods. Journal of Econometrics, 225(2):200-230.
- de Chaisemartin, C. and D'Haultfœuille, X. (2020). Two-way fixed effects estimators with heterogeneous treatment effects. American Economic Review, 110(9):2964-2996.
- Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. Journal of Econometrics, 225(2):254-277.
- Hirano, K. and Imbens, G. W. (2004). The propensity score with continuous treatments. In Gelman, A. and Meng, X.-L. (eds.), Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives, pp. 73-84. Wiley.
- Sant'Anna, P. H. C. and Zhao, J. (2020). Doubly robust difference-in-differences estimators. Journal of Econometrics, 219(1):101-122.
- Roth, J., Sant'Anna, P. H. C., Bilinski, A., and Poe, J. (2023). What's trending in difference-in-differences? A synthesis of the recent econometrics literature. Journal of Econometrics, 235(2):2218-2244.
- Sun, L. and Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. Journal of Econometrics, 225(2):175-199.