Background: SCM and ASCM
The Synthetic Control Method (Abadie et al.(2010)) constructs a counterfactual for a single treated unit by finding a convex combination of donor (control) units that best matches the treated unit's pre-treatment outcome trajectory. The synthetic control estimator identifies the average treatment effect for the treated unit in each post-treatment period.
The key limitation of classical SCM is that the pre-treatment fit may be imperfect, especially when the treated unit lies outside the convex hull of the donor pool or when the pre-treatment period is short. Ben-Michael et al.(2021) propose the Augmented Synthetic Control, which adds a ridge regression bias correction term to the SCM estimate. Under a linear factor model, the bias correction removes the bias from imperfect pre-treatment fit, and the estimator inherits the desirable properties of both SCM and regression.
Installation
The augsynth package is available on GitHub:
# install.packages("remotes")
remotes::install_github("ebenmichael/augsynth")
library(augsynth) library(dplyr) library(ggplot2)
Data Requirements
augsynth expects a data frame in wide or long format. For single-unit SCM, it needs:
- A column for each donor unit's outcomes (wide format) or a
unitandtimecolumn (long format). - A column indicating treatment status (binary: 0 before treatment, 1 after).
- The outcome variable.
For multisynth (staggered adoption), the data must be in long format with unit, time, a binary treatment indicator, and the outcome.
Single-Unit Synthetic Control
Using the augsynth()
Function
The main function is augsynth(). We use the built-in kansas dataset (about the Kansas tax cut) included in the package for illustration:
# Load built-in data
data(kansas)
# kansas: panel of US states, outcome = gdp_2012 (GDP index)
# Treatment: Kansas cuts taxes in 2012 (treated = 1 for Kansas 2012 onward)
head(kansas)
# Fit augmented synthetic control syn_out <- augsynth( form = lngdpcapita treated, # outcome treatment indicator unit = state, time = year, data = kansas, progfunc = "Ridge", # bias correction: Ridge regression scm = TRUE # include standard SCM weights )
summary(syn_out)
The progfunc argument specifies the bias correction model. Options include:
"Ridge": ridge regression (the default augmentation in Ben-Michael et al.)."None": classical SCM with no bias correction."EN": elastic net."RF": random forest (non-parametric bias correction).
Examining the Weights
# Extract synthetic control weights
weights <- syn_out$weights
print(round(weights, 3))
# Units with weight > 0 form the synthetic control
# Weights should be non-negative and sum to 1 (for SCM component)
Pre-Treatment Fit
plot(syn_out) +
labs(
title = "Augmented Synthetic Control: Kansas Tax Cut",
subtitle = "Outcome: log GDP per capita",
x = "Year",
y = "Log GDP per capita"
) +
theme_bw()
The plot shows the outcome trajectory for Kansas and its synthetic control. In the pre-treatment period, the synthetic control should track Kansas closely (indicating good fit). Post-treatment, any divergence is attributed to the tax cut.
Inference by Permutation
Since there is typically only one treated unit, large-sample standard errors are not available. The standard approach is a permutation (placebo) test: re-run the synthetic control for each donor unit pretending it was treated at the same time, and compare the estimated treatment effect for Kansas to the distribution of placebo effects.
syn_inf <- permutation_inference(syn_out, n_perm = 1000)
plot(syn_inf) +
labs(
title = "Permutation Inference: Kansas",
x = "Year",
y = "Estimated treatment effect"
) +
theme_bw()
# The p-value is the fraction of placebo effects at least as
# large as the actual treatment effect
Staggered Adoption with multisynth
The multisynth() function extends ASCM to settings with multiple treated units treated at different times. It estimates a separate synthetic control for each treated unit, then averages the treatment effects, optionally weighting by group size.
Data Preparation
set.seed(123)
n_units <- 40
n_times <- 12
# Assign treatment timing: 10 units treated at t=5, 10 at t=8, 20 never cohorts <- c(rep(5, 10), rep(8, 10), rep(Inf, 20)) unit_ids <- 1:n_units
staggered_panel <- expand.grid(unit = unit_ids, time = 1:n_times) %>% left_join(data.frame(unit = unit_ids, first_treat = cohorts), by = "unit") %>% mutate( treated = as.integer(time >= first_treat), unit_fe = rep(rnorm(n_units), each = n_times), att = ifelse(treated == 1, 3 + 0.5 * (time - first_treat), 0), y = unit_fe + 0.5 * time + att + rnorm(n()) )
Fitting multisynth
ms_out <- multisynth(
form = y treated,
unit = unit,
time = time,
data = staggered_panel,
lambda = NULL, # auto-select ridge penalty
n_leads = 4 # number of post-treatment periods to estimate
)
summary(ms_out)
The output reports the estimated ATT for each treated cohort (\(g\)) at each horizon \(\ell\) since treatment, as well as an averaged estimate across cohorts. This is directly analogous to the Callaway–Sant'Anna event-study aggregation, but using synthetic control rather than DiD-style comparisons.
Plotting Multisynth Results
ms_plot <- plot(ms_out, levels = "average") +
labs(
title = "Augmented Synthetic Control: Average ATT by Event Time",
x = "Periods since treatment",
y = "Average treatment effect"
) +
theme_bw()
print(ms_plot)
# Plot by cohort ms_plot_cohort <- plot(ms_out, levels = "individual") + facet_wrap( Level) + theme_bw() print(ms_plot_cohort)
Adding Covariates
Both augsynth() and multisynth() allow covariates to be included in the balance constraints:
# Assume the data includes a covariate x1 measured pre-treatment
syn_cov <- augsynth(
form = y treated | x1, # | separates outcome from covariates
unit = unit,
time = time,
data = staggered_panel,
progfunc = "Ridge",
scm = TRUE
)
Covariates after the | are included in the pre-treatment balance optimisation: the synthetic control weights are chosen to match both the pre-treatment outcome trajectory and the covariate values.
Choosing Between SCM, ASCM, and DiD
The choice between synthetic control methods and difference-in-differences depends on the setting:
- Use SCM/ASCM when there is one (or a small number of) treated units and a larger donor pool. SCM is especially suited to comparative case studies where no single donor unit is a natural comparison.
- Use DiD when there are many treated and control units, and you want to leverage the panel structure for efficiency. DiD's parallel trends assumption is transparent; SCM's factor model assumption may be harder to assess.
- ASCM bridges the two: it inherits the SCM's flexibility in choosing comparison units and the bias reduction from ridge regression when the pre-treatment fit is imperfect.
Ben-Michael et al.(2021) provide guidance on when ASCM outperforms SCM: primarily when the donor pool is large relative to the treated unit and when the pre-treatment period is short.
Conclusion
The augsynth package provides a streamlined interface to both classical and augmented synthetic control estimation in R. For single treated units, the workflow is: (1) call augsynth() with progfunc = "Ridge"; (2) examine pre-treatment fit; (3) run permutation inference. For staggered adoption, use multisynth() and plot event-study results by cohort. Together, these tools implement the state-of-the-art comparative case study methodology in empirical social science.
References
- Abadie, A., Diamond, A., and Hainmueller, J. (2010). Synthetic control methods for comparative case studies: Estimating the effect of California's tobacco control program. Journal of the American Statistical Association, 105(490):493--505.
- Ben-Michael, E., Feller, A., and Rothstein, J. (2021). The augmented synthetic control method. Journal of the American Statistical Association, 116(536):1789--1803.
- Callaway, B. and Sant'Anna, P. H. C. (2021). Difference-in-differences with multiple time periods. Journal of Econometrics, 225(2):200--230.
- Abadie, A. (2021). Using synthetic controls: Feasibility, data requirements, and methodological aspects. Journal of Economic Literature, 59(2):391--425.
- Doudchenko, N. and Imbens, G. W. (2016). Balancing, regression, difference-in-differences and synthetic control methods: A synthesis. NBER Working Paper No. 22791.