1 What Problem Does This Tool Solve?
When the parallel trends assumption holds only after conditioning on covariates, researchers must adjust for those covariates in DiD estimation. The standard approaches— outcome regression (OR) and inverse probability weighting (IPW)— each require their respective nuisance model to be correctly specified. Misspecifying either model introduces bias.
The drdid package implements the doubly robust DiD (DR-DiD) estimator of Sant'Anna and Zhao [2020], which combines OR and IPW so that the estimator is consistent whenever either model is correctly specified— a much weaker requirement than needing both. The package handles both balanced panel data and repeated cross-sections, and it serves as the estimation engine inside the widely used did package [Callaway and Sant'Anna, 2021].
2 Installation and Setup
The package ships with the nsw_long dataset— a panel version of the National Supported Work (NSW) training programme data from LaLonde [1986], commonly used to benchmark DiD estimators.
3 Core Functions
The package provides four main estimation functions:
The "improved" variants use normalised IPW weights (weights sum to one within each group) and are generally preferred in practice because they are better-behaved in finite samples, especially when propensity scores are close to zero or one.
4 A Minimal Working Example: Panel Data
4.1 Interpreting the Output
The main output is:
- ATT: the doubly robust estimate of the average treatment effect on the treated.
- se: the bootstrap standard error.
- lci, uci: 95% confidence interval from the percentile bootstrap.
- boots: the vector of bootstrap replications (useful for custom inference).
5 Repeated Cross-Sections
When the same individuals are not tracked across periods (e.g. repeated surveys), use the repeated cross-section functions. The key difference is that the pre-period and post-period observations are from different samples, so the outcome difference Yᵢ₁ − Yᵢ₀ is not directly observed:
6 Connection to the did Package
The drdid package is the estimation engine inside the did package of Callaway and Sant'Anna [2021]. When you call att_gt() with the default est_method = "dr" (doubly robust), it internally calls drdid_imp_panel() or drdid_imp_rc() for each (group, time) cell. Understanding drdid directly helps you interpret what did is doing and allows you to customise the estimation if needed.
7 Key Options and Pitfalls
7.1 Bootstrap Type
drdid supports two bootstrap types:
boot.type = "weighted": the multiplier bootstrap (recommended). Faster and has better finite-sample properties.
boot.type = "normal": the standard nonparametric bootstrap with replacement. More familiar but slower.
7.2 Overlap Violations
If the propensity score p̂(Xᵢ) is near 1 for some units (some control units look just like treated units), IPW weights become extreme, inflating variance. Inspect the propensity score distribution before running DR-DiD:
Extreme scores (above 0.9 or below 0.1 for controls) signal overlap problems. Consider trimming observations with extreme scores or using alternative methods.
7.3 Model Specification
By default, drdid uses logistic regression for the propensity score and OLS for the outcome regression. Both models are linear in the provided covariates. To capture nonlinearity:
- Include polynomial terms or interactions in the covariates matrix.
- Use the DoubleML package if you want machine learning nuisance estimation (cross-fitting with nonparametric learners).
8 Comparison to Alternatives
9 Conclusion
The drdid package implements the doubly robust DiD estimator of Sant'Anna and Zhao [2020] with a clean interface for panel and repeated cross-section data. Its double robustness property— consistency when either the propensity score or the outcome model is correctly specified— makes it the preferred estimator for covariate-adjusted DiD in observational settings. As the workhorse inside the did package, it underpins much of the modern staggered DiD literature. Applied researchers should prefer drdid_imp_panel() or drdid_imp_rc() over simpler alternatives whenever covariate adjustment is needed.
References
- Callaway, B. and Sant'Anna, P. H. C. (2021). Difference-in-differences with multiple time periods. Journal of Econometrics, 225(2):200-230.
- LaLonde, R. J. (1986). Evaluating the econometric evaluations of training programs with experimental data. American Economic Review, 76(4):604-620.
- Robins, J. M., Rotnitzky, A., and Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association, 89(427):846-866.
- Sant'Anna, P. H. C. and Zhao, J. (2020). Doubly robust difference-in-differences estimators. Journal of Econometrics, 219(1):101-122.
- Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1):41-55.