Toolbox

The gsynth Package in R: Generalized Synthetic Control with Interactive Fixed Effects

1 What Problem Does gsynth Solve?

The classic synthetic control method (SCM) of Abadie et al. [2010] handles one treated unit beautifully but creaks under realistic complications: multiple treated units, treatment that turns on at different times for different units, and a desire for principled standard errors. Difference-in-differences handles many units and staggered timing but imposes parallel trends, which fails whenever units respond differently to common shocks (a recession, a national policy). Xu [2017]'s generalized synthetic control (GSC), implemented in the gsynth package, bridges the two. It models untreated potential outcomes with an interactive fixed effects (IFE) structure,  

Y itN = xitβ + λift + αi + ξt + εit, (1)

where fₜ is a vector of r latent common factors and λᵢ are unit-specific loadings. The control units are used to estimate the factors fₜ; each treated unit's loadings λᵢ are then estimated from its pre-treatment outcomes, and its counterfactual is predicted by (1). This generalises both SCM (which approximates the factor structure with a weighted average of donors) and DiD (the special case of no interactive terms), accommodates many treated units and staggered adoption, and delivers parametric or bootstrap uncertainty intervals.  

2 Installation and Setup

gsynth is on CRAN. The cross-validation routine that picks the number of factors can be slow, so parallelisation helps.  

install.packages("gsynth") library(gsynth) # A balanced (or unbalanced) long panel with columns: # id : unit identifier # time : time period # Y : outcome # D : treatment indicator (1 once treated, may turn on at different times across units) # x1, x2 : time-varying covariates data(gsynth) # ships with two example data sets: simdata, turnout

3 A Minimal Working Example

The package ships with simdata, a simulated panel with staggered treatment. The core call estimates the factor model, chooses r by cross-validation, and bootstraps standard errors.  

out <- gsynth(Y ~ D + x1 + x2, data = simdata, index = c("id", "time"), force = "two-way", CV = TRUE, r = c(0, 5), se = TRUE, inference = "parametric", nboots = 1000, parallel = TRUE, cores = 4, seed = 123) print(out) out$est.att # unit, time fixed effects out$est.avg # cross-validate r out$r.cv # search 0 to 5 factors

The returned object stores the estimated counterfactual for every treated unit, the gap (treated minus counterfactual) at each period, and the aggregated average treatment effect on the treated (ATT). Plotting is built in:  

plot(out, type = "gap") # estimated ATT over event time, with CI plot(out, type = "counterfactual", id=101) # treated vs synthetic path plot(out, type = "raw") # raw outcome trajectories

The "gap" plot is the workhorse: it shows the dynamic ATT relative to the treatment date, with pre-treatment gaps near zero serving as the analogue of a pre-trends test.  

4 Key Options and Pitfalls

  • Choosing the estimator. The default IFE estimator works well when the pre-treatment window is reasonably long. For short pre-periods or many missing cells, set estimator = "mc" to use the matrix completion estimator [Athey et al., 2021], which regularises the factor structure with a nuclear-norm penalty and is more stable when data are sparse.  
  • Number of factors r. Leave CV TRUE unless theory dictates r. Cross-validation holds out pre-treatment periods and picks the r that best predicts them. Forcing too many factors overfits noise; too few reintroduces DiD-style bias.  
  • Pre-treatment length. Each treated unit's loadings are estimated from its pre-period. Units treated very early have little pre-treatment information and unreliable counterfactuals- inspect them individually.  
  • Parametric vs nonparametric inference. Use inference = "parametric" when the number of treated units is small (it does not rely on cross-sectional asymptotics in the treated group); switch to "nonparametric" (a block bootstrap over units) when treated units are plentiful.  
  • Balanced panels and missingness. The IFE estimator wants a balanced panel; the "mc" estimator tolerates an unbalanced one. Anticipation effects bias the loadings, so confirm treatment timing is coded correctly.

5 Comparison to Alternatives

Package Method Treated units Key assumption
Synth Classic SCM One Convex-hull match
gsynth GSC/IFE/MC Many, staggered Factor model
augsynth Augmented SCM One or many SCM + outcome model
synthdid Synthetic DiD Block adoption Unit + time weights
did Group-time ATT Many, staggered Conditional parallel trends
Table 1: Where gsynth sits among panel causal-inference tools.

Choose gsynth when you suspect that units load differently on common shocks so parallel trends is implausible but you have several treated units and a decent pre-treatment window. For a single treated unit with a long, clean pre-period, the original Synth or augsynth may suffice; for staggered binary treatment under conditional parallel trends, the did package of Callaway and Sant'Anna [2021] is the natural choice. gsynth's distinctive strength is letting the data, through the estimated factors, decide how units would have co-moved absent treatment a more flexible counterfactual than either SCM weights or parallel trends alone.  

References

  1. Abadie, A., Diamond, A., and Hainmueller, J. (2010). Synthetic control methods for comparative case studies: estimating the effect of California's tobacco control program. Journal of the American Statistical Association, 105(490), 493-505.  
  2. Athey, S., Bayati, M., Doudchenko, N., Imbens, G., and Khosravi, K. (2021). Matrix completion methods for causal panel data models. Journal of the American Statistical Association, 116(536), 1716-1730.  
  3. Callaway, B., and Sant'Anna, P. H. C. (2021). Difference-in-differences with multiple time periods. Journal of Econometrics, 225(2), 200-230.  
  4. Xu, Y. (2017). Generalized synthetic control method: causal inference with interactive fixed effects models. Political Analysis, 25(1), 57-76.

Continue Reading

Browse All Sections →
Home
This is some text inside of a div block.
This is some text inside of a div block.
This is some text inside of a div block.

Article Title