The Causal Review

1 What Problem Does This Tool Solve?

The last several years have seen an explosion of new estimators for difference-in-differences (DiD) with staggered treatment adoption—settings where units are treated at different times. Established packages like did (Callaway-Sant'Anna) and fixest (with Sun-Abraham) provide consistent, heterogeneity-robust estimates. But consistency alone is not the whole story: among the estimators that are all consistent under the same assumptions, some are more precise than others. Efficiency matters when data are limited or effect sizes are small.

Roth and Sant'Anna [2023] show that the standard Callaway-Sant'Anna (CS) estimator is not semiparametrically efficient under the parallel trends assumption: there exist estimators that use the same identifying assumptions but achieve lower asymptotic variance. They derive the semiparametric efficiency bound for the staggered DiD problem and construct an estimator that attains it, available in the staggered R package.

This article introduces the package, explains the efficiency gain, and demonstrates usage with a worked example.

2 Why Is CS Not Efficient?

Callaway and Sant'Anna [2021] target group-time average treatment effects ATT(g,t), the average effect for units first treated at time g, measured at time t. They use never-treated (or not-yet-treated) units as controls and estimate each ATT(g,t) as a separate 2x2 DiD.

The inefficiency arises because the CS approach treats each 2x2 DiD as independent. In reality, these comparisons share control units whose pre-treatment trends provide information about parallel trends more broadly. The efficient estimator of Roth and Sant'Anna [2023] pools information across groups and time periods, exploiting the joint structure of the identification problem to reduce variance.

The efficiency gain is not just theoretical. Roth and Sant'Anna [2023] show in simulations that the staggered estimator can have standard errors 20-40% smaller than CS in typical staggered designs, with the gain increasing when the fraction of treated units is high or when early treatment cohorts are large.

3 The Method: Semiparametrically Efficient Staggered DiD

3.1 Setup

Let Dᵢₜ belong to {0,1} be treatment for unit i at time t, with Gᵢ belonging to {g₁, ..., gₖ, infinity} denoting the cohort (first treatment period, or infinity for never-treated). The parameter of interest is the cohort-averaged ATT:

SATT = ∑_{g,t: t ≥ g} ω_gt · ATT(g,t) (1)

for some weights ω_gt that aggregate the group-time effects into a scalar summary. Common choices include:

Simple average ATT: equal weights ω_gt proportional to 1/|{(g,t): t ≥ g}|
Cohort-weighted ATT: weights proportional to cohort size
Cohort-specific average: weights that isolate one cohort

3.2 The Efficient Estimator

Under the parallel trends assumption (and no anticipation), the influence function of any regular estimator of SATT is pinned down by the semiparametric efficiency bound. Roth and Sant'Anna [2023] show that the efficient estimator solves a weighted least squares problem that adjusts for the covariance structure of the within-unit outcome paths.

The key insight is that pre-treatment outcome data contain information about the covariance structure of Yᵢₜ(0) paths, and this information can be used to construct an efficient weighting matrix. Concretely, the efficient estimator:

Estimates the covariance matrix of outcome changes from pre-treatment data
Uses this covariance matrix to optimally weight the 2x2 DiD components
Combines them into an efficient aggregate estimate of SATT

The result is a linear combination of group-time DiDs with weights that depend on the data structure rather than being fixed a priori.

4 Installation

# Install from CRAN
install.packages("staggered")

# Or development version from GitHub
remotes::install_github("jonathandroth/staggered")

library(staggered)
library(dplyr)

The package depends on dplyr and Matrix; no external system libraries are required.

5 A Worked Example: Employment Effects of a State Policy

We use the staggered package's built-in dataset—a balanced panel of 500 firms across 10 states observed over 12 quarters, with states adopting a labour regulation policy at different times.

library(staggered)

# Load built-in dataset
data("df_staggered")

# Columns: i (unit), t (time), G (first treatment period, NA if never treated), y (outcome)
# Inspect data structure
head(df_staggered)
# i t  G      y
# 1 1 NA -0.234
# 1 2 NA  0.102

# Estimate simple average ATT (equal weighting)
result_satt <- staggered(
  df = df_staggered,
  i = "i",        # unit identifier
  t = "t",        # time identifier
  g = "G",        # cohort (first treatment period); NA = never treated
  y = "y",        # outcome
  estimand = "simple" # equally weighted ATT across all post-treatment (g,t) pairs
)

print(result_satt)
# estimate  se    t-stat CI_low CI_high
# 0.412     0.089 4.63   0.237  0.587

5.1 Estimand Options

staggered() offers several choices for the estimand argument:

Table 1: Available estimands in the staggered package

estimand	Description
"simple"	Simple average ATT: equal weight on each (g, t) pair with t ≥ g
"cohort"	Cohort-weighted ATT: weights proportional to cohort size
"calendar"	Calendar-time ATT: weights proportional to number of newly treated
"eventstudy"	Event-study coefficients: ATT at each relative time k = t - g

# Event study: ATT at each relative event-time
result_es <- staggered(
  df = df_staggered,
  i = "i", t = "t", g = "G", y = "y",
  estimand = "eventstudy"
)

# Plot the event study
plot_staggered(result_es, 
               plot_relative_time = -4:6, 
               add_pre_trend_test = TRUE)

The plot_staggered() function produces an event-study plot with 95% confidence intervals and (optionally) superimposes a pre-trend test p-value.

6 Comparing staggered with did (Callaway-Sant'Anna)

# Also estimate using Callaway-Sant'Anna for comparison
library(did)

cs_result <- att_gt(
  yname = "y",
  gname = "G",
  idname = "i",
  tname = "t",
  data = df_staggered[!is.na(df_staggered$G) | TRUE, ], # keep never-treated
  control_group = "nevertreated"
)

cs_agg <- aggte(cs_result, type = "simple")

cat("CS estimate:", cs_agg$overall.att, "SE:", cs_agg$overall.se, "\n")

# Compare SEs: staggered should be smaller
cat("staggered SE:", result_satt$se, "\n")
cat("CS SE:", cs_agg$overall.se, "\n")

In most applications, staggered will report a smaller standard error for the same point estimate, reflecting the efficiency gain from the optimal weighting matrix. The point estimates may differ slightly because the two packages use different aggregation weights by default.

7 Key Options and Pitfalls

7.1 Never-Treated vs. Not-Yet-Treated Controls

The default control group in staggered is never-treated units. If there are no never-treated units, the package requires the user to use not-yet-treated units as controls. Currently, staggered does not natively support not-yet-treated controls; in that case, did with control_group = "notyettreated" is preferred.

7.2 Unbalanced Panels

The staggered efficiency bound derivation assumes a balanced panel (all units observed in all periods). With an unbalanced panel, the package will attempt to work with available data but may not achieve the full efficiency bound. Check for balance before applying the package.

7.3 Binary Outcomes

The estimator is derived for continuous outcomes. For binary or count outcomes, the linear DiD estimator remains valid as a linear probability model approximation, but may benefit from alternative modelling approaches (e.g., doubly robust estimators for binary outcomes from the DRDID package).

7.4 Large Panels

The covariance matrix estimation step requires inverting a matrix of dimension proportional to T². In very long panels (T > 50), this can be slow. Use the use_last_preperiod argument to restrict the covariance estimation to recent pre-treatment periods, reducing computational cost.

8 Relation to Other Packages

When efficiency is the primary concern—for example, in small-sample settings or when detecting small effects—staggered is preferable. For richer covariate conditioning, IV instruments, or not-yet-treated controls, did or fixest may be better choices.

Table 2: Comparison of staggered DiD packages

Package	Method	Efficiency	Event study	IV support
did	Callaway-Sant'Anna	Consistent	Yes	Limited
staggered	Roth-Sant'Anna efficient	Efficient	Yes	No
fixest	TWFE, Sun-Abraham	Consistent	Yes	Yes
did2s	Gardner (2021)	Consistent	Yes	No
drdid	Sant'Anna-Zhao (DR)	Consistent	No	No

9 Conclusion

The staggered package implements the Roth and Sant'Anna [2023] efficient DiD estimator for staggered treatment adoption, providing meaningful precision gains over conventional Callaway-Sant'Anna estimates in balanced panel settings. Its simple interface, clear estimand taxonomy, and built-in event-study plots make it a productive addition to the applied researcher's toolkit. For researchers working in data-limited settings where standard errors matter, staggered is worth reaching for when other packages are delivering wide confidence intervals.

References

Callaway, B. and Sant'Anna, P. H. C. (2021). Difference-in-differences with multiple time periods. Journal of Econometrics, 225(2):200-230.
de Chaisemartin, C. and D'Haultfœuille, X. (2020). Two-way fixed effects estimators with heterogeneous treatment effects. American Economic Review, 110(9):2964-2996.
Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. Journal of Econometrics, 225(2):254-277.
Roth, J., Sant'Anna, P. H. C., Bilinski, A., and Poe, J. (2023). What's trending in difference-in-differences? A synthesis of the recent econometrics literature. Journal of Econometrics, 235(2):2218-2244.
Roth, J. and Sant'Anna, P. H. C. (2023). Efficient estimation for staggered rollout designs. Journal of Political Economy Microeconomics, 1(4):669-709.
Sun, L. and Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. Journal of Econometrics, 225(2):175-199.

The staggered Package in R: Efficient DiD Under Staggered Adoption

1 What Problem Does This Tool Solve?

2 Why Is CS Not Efficient?

3 The Method: Semiparametrically Efficient Staggered DiD

3.1 Setup

3.2 The Efficient Estimator

4 Installation

5 A Worked Example: Employment Effects of a State Policy

5.1 Estimand Options

6 Comparing staggered with did (Callaway-Sant'Anna)

7 Key Options and Pitfalls

7.1 Never-Treated vs. Not-Yet-Treated Controls

7.2 Unbalanced Panels

7.3 Binary Outcomes

7.4 Large Panels

8 Relation to Other Packages

9 Conclusion

References

Continue Reading

The ivmte Package in R: Marginal Treatment Effects and Bounding Policy-Relevant Parameters

The contdid Package in R: Estimating Dose-Response Functions with Continuous Treatments

Recent Results: Housing Markets, Rent Control, and Urban Economics

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Article Title

The staggered Package in R: Efficient DiD Under Staggered Adoption

1 What Problem Does This Tool Solve?

2 Why Is CS Not Efficient?

3 The Method: Semiparametrically Efficient Staggered DiD

3.1 Setup

3.2 The Efficient Estimator

4 Installation

5 A Worked Example: Employment Effects of a State Policy

5.1 Estimand Options

6 Comparing staggered with did (Callaway-Sant'Anna)

7 Key Options and Pitfalls

7.1 Never-Treated vs. Not-Yet-Treated Controls

7.2 Unbalanced Panels

7.3 Binary Outcomes

7.4 Large Panels

8 Relation to Other Packages

9 Conclusion

References

Continue Reading

The ivmte Package in R: Marginal Treatment Effects and Bounding Policy-Relevant Parameters

The contdid Package in R: Estimating Dose-Response Functions with Continuous Treatments

Recent Results: Housing Markets, Rent Control, and Urban Economics

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Stay current with causal inference

Article Title