Toolbox

The staggered Package in R: Efficient DiD Under Staggered Adoption

1 What Problem Does This Tool Solve?

The last several years have seen an explosion of new estimators for difference-in-differences (DiD) with staggered treatment adoption—settings where units are treated at different times. Established packages like did (Callaway-Sant'Anna) and fixest (with Sun-Abraham) provide consistent, heterogeneity-robust estimates. But consistency alone is not the whole story: among the estimators that are all consistent under the same assumptions, some are more precise than others. Efficiency matters when data are limited or effect sizes are small.

Roth and Sant'Anna [2023] show that the standard Callaway-Sant'Anna (CS) estimator is not semiparametrically efficient under the parallel trends assumption: there exist estimators that use the same identifying assumptions but achieve lower asymptotic variance. They derive the semiparametric efficiency bound for the staggered DiD problem and construct an estimator that attains it, available in the staggered R package.

This article introduces the package, explains the efficiency gain, and demonstrates usage with a worked example.

2 Why Is CS Not Efficient?

Callaway and Sant'Anna [2021] target group-time average treatment effects ATT(g,t), the average effect for units first treated at time g, measured at time t. They use never-treated (or not-yet-treated) units as controls and estimate each ATT(g,t) as a separate 2x2 DiD.

The inefficiency arises because the CS approach treats each 2x2 DiD as independent. In reality, these comparisons share control units whose pre-treatment trends provide information about parallel trends more broadly. The efficient estimator of Roth and Sant'Anna [2023] pools information across groups and time periods, exploiting the joint structure of the identification problem to reduce variance.

The efficiency gain is not just theoretical. Roth and Sant'Anna [2023] show in simulations that the staggered estimator can have standard errors 20-40% smaller than CS in typical staggered designs, with the gain increasing when the fraction of treated units is high or when early treatment cohorts are large.

3 The Method: Semiparametrically Efficient Staggered DiD

3.1 Setup

Let Dᵢₜ belong to {0,1} be treatment for unit i at time t, with Gᵢ belonging to {g₁, ..., gₖ, infinity} denoting the cohort (first treatment period, or infinity for never-treated). The parameter of interest is the cohort-averaged ATT:

SATT = g,t: t ≥ g ωgt · ATT(g,t) (1)

for some weights ωgt that aggregate the group-time effects into a scalar summary. Common choices include:

  • Simple average ATT: equal weights ω_gt proportional to 1/|{(g,t): t ≥ g}|
  • Cohort-weighted ATT: weights proportional to cohort size
  • Cohort-specific average: weights that isolate one cohort

3.2 The Efficient Estimator

Under the parallel trends assumption (and no anticipation), the influence function of any regular estimator of SATT is pinned down by the semiparametric efficiency bound. Roth and Sant'Anna [2023] show that the efficient estimator solves a weighted least squares problem that adjusts for the covariance structure of the within-unit outcome paths.

The key insight is that pre-treatment outcome data contain information about the covariance structure of Yᵢₜ(0) paths, and this information can be used to construct an efficient weighting matrix. Concretely, the efficient estimator:

  1. Estimates the covariance matrix of outcome changes from pre-treatment data
  2. Uses this covariance matrix to optimally weight the 2x2 DiD components
  3. Combines them into an efficient aggregate estimate of SATT

The result is a linear combination of group-time DiDs with weights that depend on the data structure rather than being fixed a priori.

4 Installation

# Install from CRAN
install.packages("staggered")

# Or development version from GitHub
remotes::install_github("jonathandroth/staggered")

library(staggered)
library(dplyr)

The package depends on dplyr and Matrix; no external system libraries are required.

5 A Worked Example: Employment Effects of a State Policy

We use the staggered package's built-in dataset—a balanced panel of 500 firms across 10 states observed over 12 quarters, with states adopting a labour regulation policy at different times.

library(staggered)

# Load built-in dataset
data("df_staggered")

# Columns: i (unit), t (time), G (first treatment period, NA if never treated), y (outcome)
# Inspect data structure
head(df_staggered)
# i t  G      y
# 1 1 NA -0.234
# 1 2 NA  0.102
# Estimate simple average ATT (equal weighting)
result_satt <- staggered(
  df = df_staggered,
  i = "i",        # unit identifier
  t = "t",        # time identifier
  g = "G",        # cohort (first treatment period); NA = never treated
  y = "y",        # outcome
  estimand = "simple" # equally weighted ATT across all post-treatment (g,t) pairs
)

print(result_satt)
# estimate  se    t-stat CI_low CI_high
# 0.412     0.089 4.63   0.237  0.587

5.1 Estimand Options

staggered() offers several choices for the estimand argument:

Table 1: Available estimands in the staggered package

estimand Description
"simple" Simple average ATT: equal weight on each (g, t) pair with t ≥ g
"cohort" Cohort-weighted ATT: weights proportional to cohort size
"calendar" Calendar-time ATT: weights proportional to number of newly treated
"eventstudy" Event-study coefficients: ATT at each relative time k = t - g
# Event study: ATT at each relative event-time
result_es <- staggered(
  df = df_staggered,
  i = "i", t = "t", g = "G", y = "y",
  estimand = "eventstudy"
)

# Plot the event study
plot_staggered(result_es, 
               plot_relative_time = -4:6, 
               add_pre_trend_test = TRUE)

The plot_staggered() function produces an event-study plot with 95% confidence intervals and (optionally) superimposes a pre-trend test p-value.

6 Comparing staggered with did (Callaway-Sant'Anna)

# Also estimate using Callaway-Sant'Anna for comparison
library(did)

cs_result <- att_gt(
  yname = "y",
  gname = "G",
  idname = "i",
  tname = "t",
  data = df_staggered[!is.na(df_staggered$G) | TRUE, ], # keep never-treated
  control_group = "nevertreated"
)

cs_agg <- aggte(cs_result, type = "simple")

cat("CS estimate:", cs_agg$overall.att, "SE:", cs_agg$overall.se, "\n")

# Compare SEs: staggered should be smaller
cat("staggered SE:", result_satt$se, "\n")
cat("CS SE:", cs_agg$overall.se, "\n")

In most applications, staggered will report a smaller standard error for the same point estimate, reflecting the efficiency gain from the optimal weighting matrix. The point estimates may differ slightly because the two packages use different aggregation weights by default.

7 Key Options and Pitfalls

7.1 Never-Treated vs. Not-Yet-Treated Controls

The default control group in staggered is never-treated units. If there are no never-treated units, the package requires the user to use not-yet-treated units as controls. Currently, staggered does not natively support not-yet-treated controls; in that case, did with control_group = "notyettreated" is preferred.

7.2 Unbalanced Panels

The staggered efficiency bound derivation assumes a balanced panel (all units observed in all periods). With an unbalanced panel, the package will attempt to work with available data but may not achieve the full efficiency bound. Check for balance before applying the package.

7.3 Binary Outcomes

The estimator is derived for continuous outcomes. For binary or count outcomes, the linear DiD estimator remains valid as a linear probability model approximation, but may benefit from alternative modelling approaches (e.g., doubly robust estimators for binary outcomes from the DRDID package).

7.4 Large Panels

The covariance matrix estimation step requires inverting a matrix of dimension proportional to T². In very long panels (T > 50), this can be slow. Use the use_last_preperiod argument to restrict the covariance estimation to recent pre-treatment periods, reducing computational cost.

8 Relation to Other Packages

When efficiency is the primary concern—for example, in small-sample settings or when detecting small effects—staggered is preferable. For richer covariate conditioning, IV instruments, or not-yet-treated controls, did or fixest may be better choices.

Table 2: Comparison of staggered DiD packages

Package Method Efficiency Event study IV support
did Callaway-Sant'Anna Consistent Yes Limited
staggered Roth-Sant'Anna efficient Efficient Yes No
fixest TWFE, Sun-Abraham Consistent Yes Yes
did2s Gardner (2021) Consistent Yes No
drdid Sant'Anna-Zhao (DR) Consistent No No

9 Conclusion

The staggered package implements the Roth and Sant'Anna [2023] efficient DiD estimator for staggered treatment adoption, providing meaningful precision gains over conventional Callaway-Sant'Anna estimates in balanced panel settings. Its simple interface, clear estimand taxonomy, and built-in event-study plots make it a productive addition to the applied researcher's toolkit. For researchers working in data-limited settings where standard errors matter, staggered is worth reaching for when other packages are delivering wide confidence intervals.

References

  1. Callaway, B. and Sant'Anna, P. H. C. (2021). Difference-in-differences with multiple time periods. Journal of Econometrics, 225(2):200-230.
  2. de Chaisemartin, C. and D'Haultfœuille, X. (2020). Two-way fixed effects estimators with heterogeneous treatment effects. American Economic Review, 110(9):2964-2996.
  3. Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. Journal of Econometrics, 225(2):254-277.
  4. Roth, J., Sant'Anna, P. H. C., Bilinski, A., and Poe, J. (2023). What's trending in difference-in-differences? A synthesis of the recent econometrics literature. Journal of Econometrics, 235(2):2218-2244.
  5. Roth, J. and Sant'Anna, P. H. C. (2023). Efficient estimation for staggered rollout designs. Journal of Political Economy Microeconomics, 1(4):669-709.
  6. Sun, L. and Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. Journal of Econometrics, 225(2):175-199.

Continue Reading

Browse All Sections →
Home
This is some text inside of a div block.
This is some text inside of a div block.
This is some text inside of a div block.

Article Title