Toolbox

The Synth Package in R: Implementing the Original Abadie Synthetic Control

1 What Problem Does This Tool Solve?

The synthetic control method [Abadie et al., 2010] addresses the evaluation of a policy applied to a single aggregate unit (a state, country, or city) when a single comparison unit is not a credible counterfactual. Instead, it constructs a weighted combination of control units— the "synthetic" treated unit whose pre-treatment characteristics closely match those of the treated unit. The Synth package [Abadie et al., 2011] is the reference implementation of the original Abadie-Diamond-Hainmueller (ADH) estimator. It provides functions for:

  • Constructing the synthetic control by solving the nested optimisation problem.
  • Summarising the balance between the treated unit and its synthetic counterpart.
  • Producing the canonical "path plot" (treated vs synthetic trajectories) and "gap plot" (difference over time).
  • Conducting permutation inference (placebo studies in space).

2 Installation and Setup

# Install from CRAN
install.packages("Synth")

library(Synth)

# The package ships with the Basque terrorism dataset (Abadie & Gardeazabal 2003)
# and the California tobacco dataset (Abadie, Diamond & Hainmueller 2010)
data(synth.data) # Basque example
data(basque) # Also included

3 Data Preparation with dataprep()

The dataprep() function structures your panel data into the matrices required by the optimisation routine. It requires:

  • A balanced panel (long format: one row per unit-period).
  • A single treated unit with a known treatment start period.
  • A donor pool of untreated units.
# Using the Basque terrorism study (treated unit: Basque Country, treatment: ETA terrorism beginning 1970, outcome GDP per capita)
data(basque)
dataprep.out <- dataprep(
  foo = basque,
  predictors = c("school.illit", "school.prim", "school.med", "school.high", "school.post.high", "invest"),
  predictors.op = "mean", # average predictors over pre-treatment period
  time.predictors.prior = 1964:1969,
  special.predictors = list(
    list("gdpcap", 1960:1969, "mean"), # GDP as special predictor
    list("sec.agriculture", 1961:1969, "mean"),
    list("sec.energy", 1961:1969, "mean"),
    list("sec.industry", 1961:1969, "mean")
  ),
  dependent = "gdpcap", # outcome variable
  unit.variable = "regionno",
  unit.names.variable = "regionname",
  time.variable = "year",
  treatment.identifier = 17, # unit code for Basque Country
  controls.identifier = c(2:16, 18), # donor pool
  time.optimize.ssr = 1960:1969, # pre-treatment period
  time.plot = 1955:1997 # full plot range
)

The special.predictors argument allows you to include lagged values of the outcome at specific time points as predictors, which is standard practice for improving pre-treatment fit on the outcome trajectory.

4 Running the Optimisation with synth()

synth.out <- synth(dataprep.out)

The synth() function solves the nested optimisation: for a given predictor weight vector v, it finds the unit weights w(v) minimising the predictor imbalance; then it searches over v to minimise the pre-treatment MSPE of the outcome variable . The result is a list containing:

  • synth.out$solution.w: the optimal unit weights wⱼ.
  • synth.out$solution.v: the optimal predictor weights vₛ.
  • synth.out$loss.w: the pre-treatment MSPE.

5 Summarising Results with synth.tab()

# Balance table: treated vs synthetic on predictors
synth.tables <- synth.tab(dataprep.res = dataprep.out, synth.res = synth.out)
print(synth.tables$tab.pred) # predictor balance
print(synth.tables$tab.w) # unit weights (donor pool composition)

The tab.pred output shows the pre-treatment values of each predictor for the treated unit, the synthetic control, and the simple donor-pool average. A good synthetic control should match the treated unit on all predictors; substantial imbalance on any predictor is a warning sign . The tab.w table lists all donor units and their weights. In the Basque study, the synthetic Basque Country is composed primarily of Catalonia (65%) and Madrid (25%).

6 Plotting: Path and Gap Plots

# Path plot: treated unit (solid) vs synthetic control (dashed)
path.plot(synth.res = synth.out, dataprep.res = dataprep.out, Ylab = "Per Capita GDP (1986 USD)", Xlab = "Year", Ylim = c(0, 12000), Legend = c("Basque Country", "Synthetic Basque Country"), Legend.position = "bottomright")

# Add vertical line at treatment start
abline(v=1970, lty=2, col="red")

# Gap plot: estimated treatment effect = treated minus synthetic
gaps.plot(synth.res = synth.out, dataprep.res = dataprep.out, Ylab = "Effect of ETA Terrorism on GDP per Capita", Xlab = "Year", Ylim = c(-3000, 3000))

abline(v=1970, lty=2, col="red")

A good synthetic control will show:

  • Path plot: The synthetic control trajectory closely overlapping the treated unit's trajectory before the treatment year.
  • Gap plot: The gap hovering near zero in the pre-treatment period, then diverging after treatment.

7 Permutation Inference: Placebo Studies

To assess statistical significance, apply the synthetic control to each control unit in turn ("space placebos"):

# Loop over all control units to compute placebo effects
placebos <- list()
for(ctrl_unit in c(2:16, 18)) {
  dp_placebo <- dataprep(foo = basque, predictors = c("school.illit", "school.prim", "school.med", "school.high", "school.post.high", "invest"), predictors.op = "mean", time.predictors.prior = 1964:1969, special.predictors = list(list("gdpcap", 1960:1969, "mean"), list("sec.agriculture", 1961:1969, "mean"), list("sec.industry", 1961:1969, "mean")), dependent = "gdpcap", unit.variable = "regionno", unit.names.variable = "regionname", time.variable = "year", treatment.identifier = ctrl_unit, controls.identifier = setdiff(c(2:16, 18), ctrl_unit), time.optimize.ssr = 1960:1969, time.plot = 1955:1997)
  synth_placebo <- synth(dp_placebo)
  placebos[[as.character(ctrl_unit)]] <- dp_placebo$Y1plot - (dp_placebo$Y0plot %*% synth_placebo$solution.w)
}

The key diagnostic is the ratio of post-treatment MSPE to pre-treatment MSPE. Discard placebos with poor pre-treatment fit (pre-MSPE more than twice the treated unit's). If the treated unit's MSPE ratio exceeds all (or nearly all) controls, the effect is statistically significant.

8 Key Options and Pitfalls

8.1 Predictor Choice

Include pre-treatment lags of the outcome (typically 3-4 time points spanning the pre-period) as special predictors. This ensures the synthetic control matches the treated unit's outcome trajectory. Poor pre-treatment fit on the outcome is the primary warning sign of an invalid synthetic control.

8.2 Balanced Panel Requirement

Synth requires a balanced panel: all units observed in all time periods. If your data has gaps, you must impute or restrict the sample. Unbalanced panels require the augsynth or SCtools packages.

8.3 Optimisation Failures

The nested optimisation occasionally fails to converge or produces degenerate solutions. Run synth() with multiple starting values (optimxmethod = "All") to check robustness.

8.4 Comparison to Modern Extensions

The original Synth estimator minimises pre-treatment MSPE but does not correct for residual imbalance. For settings with many pre-treatment periods and some remaining imbalance, the augmented synthetic control (augsynth) or synthetic DiD (synthdid) may produce lower bias.

9 Comparison to Alternatives

Package Method Panel required Staggered?
Synth Original ADH Balanced No
augsynth Augmented SC (ASCM) Balanced Yes
synthdid Synthetic DiD Balanced Limited
SCtools Utilities + parallelisation Balanced No
Table 1: Synthetic Control Packages in R

10 Conclusion

The Synth package is the reference R implementation of the Abadie et al. [2010] synthetic control estimator. Its dataprep() -> synth() -> synth.tab() -> path.plot() -> gaps.plot() workflow is straightforward and well-documented. For the canonical single-treated-unit comparative case study with a long pre-treatment panel, Synth remains the natural starting point.

References

  1. Abadie, A. and Gardeazabal, J. (2003). The economic costs of conflict: A case study of the Basque Country. American Economic Review, 93(1):113-132.
  2. Abadie, A., Diamond, A., and Hainmueller, J. (2010). Synthetic control methods for comparative case studies: Estimating the effect of California's tobacco control program. Journal of the American Statistical Association, 105(490):493-505.
  3. Abadie, A., Diamond, A., and Hainmueller, J. (2011). Synth: An R package for synthetic control methods in comparative case studies. Journal of Statistical Software, 42(13):1-17.
  4. Abadie, A. (2021). Using synthetic controls: Feasibility, data requirements, and methodological aspects. Journal of Economic Literature, 59(2):391-425.
  5. Ben-Michael, E., Feller, A., and Rothstein, J. (2021). The augmented synthetic control method. Journal of the American Statistical Association, 116(536):1789-1803.
  6. Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., and Wager, S. (2021). Synthetic difference-in-differences. American Economic Review, 111(12):4088-4118.

Continue Reading

Browse All Sections →
Home
This is some text inside of a div block.
This is some text inside of a div block.
This is some text inside of a div block.

Article Title