Toolbox

The `did` Package in R: A Complete Workflow

Installation and Setup

The did package is available on CRAN and is developed by Callaway and Sant'Anna. Install it as follows:


install.packages("did")
install.packages("HonestDiD") # for sensitivity analysis
install.packages("dplyr")
install.packages("ggplot2")

library(did) library(HonestDiD) library(dplyr) library(ggplot2)

Data Requirements

The did package requires a panel dataset in long format with the following variables:

  • A unit identifier (e.g., state, firm, individual ID).
  • A time variable (integer years or periods).
  • A treatment cohort variable: the period in which each unit was first treated. For never-treated units, this should be set to 0 (or Inf). Crucially, the cohort variable should be constant for each unit across all time periods.
  • The outcome variable.
  • Optionally, pre-treatment covariates for conditional parallel trends.

Example: Simulating a Dataset


set.seed(42)
n_units <- 200 # number of units
n_times <- 8 # time periods (1 to 8)

# Assign cohorts: treated in period 3, 5, or 7; rest never treated cohort_prob <- c(0.3, 0.3, 0.3, 0.1) # prob for cohort 3, 5, 7, never cohorts_all <- sample(c(3, 5, 7, 0), size = n_units, replace = TRUE, prob = cohort_prob)

panel <- expand.grid(id = 1:n_units, time = 1:n_times) panel <- panel %>% left_join(data.frame(id = 1:n_units, cohort = cohorts_all), by = "id") %>% mutate( D = as.integer(cohort > 0 & time >= cohort), # True ATT grows with time since treatment att_true = ifelse(D == 1, 2 + 0.5 * (time - cohort), 0), # Unit fixed effect unit_fe = rnorm(n_units)[id], # Time fixed effect time_fe = 0.3 * time, # Outcome y = unit_fe + time_fe + att_true + rnorm(n() , sd = 1.5) )

head(panel, 10)

Estimating Group-Time ATTs with att\_gt()

The main estimation function is att\_gt(). Its key arguments are:

  • yname: name of the outcome variable (character string).
  • tname: name of the time variable.
  • idname: name of the unit identifier.
  • gname: name of the cohort variable (0 for never-treated).
  • xformla: a one-sided formula for covariates (optional; NULL for unconditional parallel trends).
  • data: the data frame.
  • est\_method: estimation method, either "dr" (doubly robust, default), "ipw" (inverse probability weighting), or "reg" (outcome regression).
  • control\_group: either "nevertreated" or "notyettreated".
  • anticipation: number of periods of anticipation to allow (default 0).
  • clustervars: variable to cluster standard errors on (default: unit identifier).

# Unconditional parallel trends, doubly robust, never-treated controls
cs_out <- att_gt(
 yname = "y",
 tname = "time",
 idname = "id",
 gname = "cohort",
 data = panel,
 est_method = "dr",
 control_group = "nevertreated",
 anticipation = 0,
 clustervars = "id"
)

summary(cs_out)

The output of summary() lists the estimated \(\widehat{ATT}(g,t)\) for each \((g,t)\) pair with \(t \geq g\), along with standard errors and 95% simultaneous confidence bands (using the multiplier bootstrap for uniform inference).

Aggregating ATTs

The function aggte() aggregates the group-time ATTs into policy-relevant summaries.

Simple Aggregate


agg_simple <- aggte(cs_out, type = "simple")
summary(agg_simple)
# Reports the weighted average of all ATT(g,t) with t >= g

Dynamic (Event-Study) Aggregation


agg_dynamic <- aggte(cs_out, type = "dynamic", min_e = -3, max_e = 4)
summary(agg_dynamic)
# Reports theta^dynamic(ell) for ell = -3, ..., 4
# ell < 0: pre-trend tests (should be  0 under no anticipation)
# ell >= 0: dynamic treatment effects

Calendar-Time Aggregation


agg_calendar <- aggte(cs_out, type = "calendar")
summary(agg_calendar)

Plotting Event Studies

The ggdid() function produces publication-quality event-study plots:


ggdid(agg_dynamic) +
 labs(
 title = "Event-Study Plot: Dynamic ATT Estimates",
 subtitle = "Callaway-Sant'Anna (2021), doubly robust",
 x = "Periods relative to treatment",
 y = "Estimated ATT"
 ) +
 geom_hline(yintercept = 0, linetype = "dashed") +
 theme_bw()

In the output, periods with negative values of \(\ell\) (to the left of the dashed line) test pre-trends; they should be statistically indistinguishable from zero if the parallel trends assumption holds.

Conditional Parallel Trends with Covariates

If you have pre-treatment covariates that you wish to condition on:


# Add a covariate to the simulated data
panel <- panel %>%
 group_by(id) %>%
 mutate(x1 = first(rnorm(1, mean = cohort/5, sd = 1))) %>%
 ungroup()

# Estimate with covariate cs_cond <- att_gt( yname = "y", tname = "time", idname = "id", gname = "cohort", xformla = x1, # condition on x1 data = panel, est_method = "dr", control_group = "nevertreated" )

agg_cond <- aggte(cs_cond, type = "dynamic") ggdid(agg_cond)

When xformla is specified, the doubly robust estimator fits both a propensity score model and an outcome regression model, each including the covariates in xformla.

Sensitivity Analysis with HonestDiD

The HonestDiD package implements the Rambachan and Roth(2023) sensitivity analysis. It allows you to ask: how large would a violation of parallel trends have to be to overturn my conclusion?


# Extract point estimates and covariance matrix from aggte output
betas <- agg_dynamic$att
sigma <- agg_dynamic$V

# Identify the pre-period (ell < 0) and post-period (ell >= 0) indices n_pre <- sum(agg_dynamic$egt < 0) n_post <- sum(agg_dynamic$egt >= 0)

# Run HonestDiD with relative magnitudes restriction (M-bar) honestdid_out <- createSensitivityResults_relativeMagnitudes( betahat = betas[(n_pre + 1):(n_pre + n_post)], sigma = sigma[(n_pre + 1):(n_pre + n_post), (n_pre + 1):(n_pre + n_post)], numPrePeriods = n_pre, numPostPeriods = n_post, Mbarvec = seq(0, 2, by = 0.5) )

# Plot the sensitivity results createSensitivityPlot_relativeMagnitudes(honestdid_out)

The sensitivity plot shows the confidence set for each post-period ATT as a function of \(\bar{M}\) (the maximum allowed deviation from parallel trends, expressed as a multiple of the largest pre-period trend). When the confidence set excludes zero for all \(\bar{M}\) in a reasonable range, the result is robust to modest parallel trends violations.

Not-Yet-Treated Control Group

In settings with few never-treated units, the not-yet-treated comparison group can be used:


cs_nyt <- att_gt(
 yname = "y",
 tname = "time",
 idname = "id",
 gname = "cohort",
 data = panel,
 est_method = "dr",
 control_group = "notyettreated" # use not-yet-treated
)

This requires an additional assumption: units not yet treated at time \(t\) are on parallel trends with cohort \(g\) at time \(t\). This is a stronger assumption than using only never-treated units.

Conclusion

The did package provides a complete workflow for staggered DiD estimation with the Callaway–Sant'Anna methodology. Key steps are: (1) prepare a long-format panel with a constant cohort variable; (2) call att\_gt() to estimate group-time ATTs; (3) call aggte() with type = "dynamic" to obtain an event-study aggregation; (4) plot with ggdid(); and (5) assess robustness with HonestDiD. Combined with the methods article in this issue, this toolbox provides everything needed to run a modern staggered DiD analysis.

References

  1. Callaway, B. and Sant'Anna, P. H. C. (2021). Difference-in-differences with multiple time periods. Journal of Econometrics, 225(2):200--230.
  2. Rambachan, A. and Roth, J. (2023). A more credible approach to parallel trends. Review of Economic Studies, 90(5):2555--2591.
  3. Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. Journal of Econometrics, 225(2):254--277.
  4. Roth, J., Sant'Anna, P. H. C., Bilinski, A., and Poe, J. (2023). What's trending in difference-in-differences? A synthesis of the recent econometrics literature. Journal of Econometrics, 235(2):2218--2244.
  5. Sun, L. and Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. Journal of Econometrics, 225(2):175--199.

Continue Reading

Browse All Sections →
Home
This is some text inside of a div block.
This is some text inside of a div block.
This is some text inside of a div block.

Article Title