The Causal Review

1 What Problem Does This Tool Solve?

The synthetic control method [Abadie et al., 2010] addresses the evaluation of a policy applied to a single aggregate unit (a state, country, or city) when a single comparison unit is not a credible counterfactual. Instead, it constructs a weighted combination of control units— the "synthetic" treated unit whose pre-treatment characteristics closely match those of the treated unit. The Synth package [Abadie et al., 2011] is the reference implementation of the original Abadie-Diamond-Hainmueller (ADH) estimator. It provides functions for:

Constructing the synthetic control by solving the nested optimisation problem.

Summarising the balance between the treated unit and its synthetic counterpart.

Producing the canonical "path plot" (treated vs synthetic trajectories) and "gap plot" (difference over time).

Conducting permutation inference (placebo studies in space).

2 Installation and Setup

    # Install from CRAN

    install.packages("Synth")

    library(Synth)

    # The package ships with the Basque terrorism dataset (Abadie & Gardeazabal 2003)

    # and the California tobacco dataset (Abadie, Diamond & Hainmueller 2010)

    data(synth.data) # Basque example

    data(basque) # Also included

3 Data Preparation with dataprep()

The dataprep() function structures your panel data into the matrices required by the optimisation routine. It requires:

A balanced panel (long format: one row per unit-period).

A single treated unit with a known treatment start period.

A donor pool of untreated units.

    # Using the Basque terrorism study (treated unit: Basque Country, treatment: ETA terrorism beginning 1970, outcome GDP per capita)

    data(basque)

    dataprep.out <- dataprep(

      foo = basque,

      predictors = c("school.illit", "school.prim", "school.med", "school.high", "school.post.high", "invest"),

      predictors.op = "mean", # average predictors over pre-treatment period

      time.predictors.prior = 1964:1969,

      special.predictors = list(

        list("gdpcap", 1960:1969, "mean"), # GDP as special predictor

        list("sec.agriculture", 1961:1969, "mean"),

        list("sec.energy", 1961:1969, "mean"),

        list("sec.industry", 1961:1969, "mean")

      ),

      dependent = "gdpcap", # outcome variable

      unit.variable = "regionno",

      unit.names.variable = "regionname",

      time.variable = "year",

      treatment.identifier = 17, # unit code for Basque Country

      controls.identifier = c(2:16, 18), # donor pool

      time.optimize.ssr = 1960:1969, # pre-treatment period

      time.plot = 1955:1997 # full plot range

    )

The special.predictors argument allows you to include lagged values of the outcome at specific time points as predictors, which is standard practice for improving pre-treatment fit on the outcome trajectory.

4 Running the Optimisation with synth()`‍`

    synth.out <- synth(dataprep.out)

The synth() function solves the nested optimisation: for a given predictor weight vector v, it finds the unit weights w(v) minimising the predictor imbalance; then it searches over v to minimise the pre-treatment MSPE of the outcome variable . The result is a list containing:

synth.out$solution.w: the optimal unit weights wⱼ.

synth.out$solution.v: the optimal predictor weights vₛ.

synth.out$loss.w: the pre-treatment MSPE.

5 Summarising Results with synth.tab()`‍`

    # Balance table: treated vs synthetic on predictors

    synth.tables <- synth.tab(dataprep.res = dataprep.out, synth.res = synth.out)

    print(synth.tables$tab.pred) # predictor balance

    print(synth.tables$tab.w) # unit weights (donor pool composition)

The tab.pred output shows the pre-treatment values of each predictor for the treated unit, the synthetic control, and the simple donor-pool average. A good synthetic control should match the treated unit on all predictors; substantial imbalance on any predictor is a warning sign . The tab.w table lists all donor units and their weights. In the Basque study, the synthetic Basque Country is composed primarily of Catalonia (65%) and Madrid (25%).

6 Plotting: Path and Gap Plots

    # Path plot: treated unit (solid) vs synthetic control (dashed)

    path.plot(synth.res = synth.out, dataprep.res = dataprep.out, Ylab = "Per Capita GDP (1986 USD)", Xlab = "Year", Ylim = c(0, 12000), Legend = c("Basque Country", "Synthetic Basque Country"), Legend.position = "bottomright")

    # Add vertical line at treatment start

    abline(v=1970, lty=2, col="red")

    # Gap plot: estimated treatment effect = treated minus synthetic

    gaps.plot(synth.res = synth.out, dataprep.res = dataprep.out, Ylab = "Effect of ETA Terrorism on GDP per Capita", Xlab = "Year", Ylim = c(-3000, 3000))

    abline(v=1970, lty=2, col="red")

A good synthetic control will show:

Path plot: The synthetic control trajectory closely overlapping the treated unit's trajectory before the treatment year.

Gap plot: The gap hovering near zero in the pre-treatment period, then diverging after treatment.

7 Permutation Inference: Placebo Studies

To assess statistical significance, apply the synthetic control to each control unit in turn ("space placebos"):

    # Loop over all control units to compute placebo effects

    placebos <- list()

    for(ctrl_unit in c(2:16, 18)) {

      dp_placebo <- dataprep(foo = basque, predictors = c("school.illit", "school.prim", "school.med", "school.high", "school.post.high", "invest"), predictors.op = "mean", time.predictors.prior = 1964:1969, special.predictors = list(list("gdpcap", 1960:1969, "mean"), list("sec.agriculture", 1961:1969, "mean"), list("sec.industry", 1961:1969, "mean")), dependent = "gdpcap", unit.variable = "regionno", unit.names.variable = "regionname", time.variable = "year", treatment.identifier = ctrl_unit, controls.identifier = setdiff(c(2:16, 18), ctrl_unit), time.optimize.ssr = 1960:1969, time.plot = 1955:1997)

      synth_placebo <- synth(dp_placebo)

      placebos[[as.character(ctrl_unit)]] <- dp_placebo$Y1plot - (dp_placebo$Y0plot %*% synth_placebo$solution.w)

    }

The key diagnostic is the ratio of post-treatment MSPE to pre-treatment MSPE. Discard placebos with poor pre-treatment fit (pre-MSPE more than twice the treated unit's). If the treated unit's MSPE ratio exceeds all (or nearly all) controls, the effect is statistically significant.

8 Key Options and Pitfalls

8.1 Predictor Choice

Include pre-treatment lags of the outcome (typically 3-4 time points spanning the pre-period) as special predictors. This ensures the synthetic control matches the treated unit's outcome trajectory. Poor pre-treatment fit on the outcome is the primary warning sign of an invalid synthetic control.

8.2 Balanced Panel Requirement

Synth requires a balanced panel: all units observed in all time periods. If your data has gaps, you must impute or restrict the sample. Unbalanced panels require the augsynth or SCtools packages.

8.3 Optimisation Failures

The nested optimisation occasionally fails to converge or produces degenerate solutions. Run synth() with multiple starting values (optimxmethod = "All") to check robustness.

8.4 Comparison to Modern Extensions

The original Synth estimator minimises pre-treatment MSPE but does not correct for residual imbalance. For settings with many pre-treatment periods and some remaining imbalance, the augmented synthetic control (augsynth) or synthetic DiD (synthdid) may produce lower bias.

9 Comparison to Alternatives

Package	Method	Panel required	Staggered?
Synth	Original ADH	Balanced	No
augsynth	Augmented SC (ASCM)	Balanced	Yes
synthdid	Synthetic DiD	Balanced	Limited
SCtools	Utilities + parallelisation	Balanced	No

Table 1: Synthetic Control Packages in R

10 Conclusion

The Synth package is the reference R implementation of the Abadie et al. [2010] synthetic control estimator. Its dataprep() -> synth() -> synth.tab() -> path.plot() -> gaps.plot() workflow is straightforward and well-documented. For the canonical single-treated-unit comparative case study with a long pre-treatment panel, Synth remains the natural starting point.

‍

References

Abadie, A. and Gardeazabal, J. (2003). The economic costs of conflict: A case study of the Basque Country. American Economic Review, 93(1):113-132.
Abadie, A., Diamond, A., and Hainmueller, J. (2010). Synthetic control methods for comparative case studies: Estimating the effect of California's tobacco control program. Journal of the American Statistical Association, 105(490):493-505.
Abadie, A., Diamond, A., and Hainmueller, J. (2011). Synth: An R package for synthetic control methods in comparative case studies. Journal of Statistical Software, 42(13):1-17.
Abadie, A. (2021). Using synthetic controls: Feasibility, data requirements, and methodological aspects. Journal of Economic Literature, 59(2):391-425.
Ben-Michael, E., Feller, A., and Rothstein, J. (2021). The augmented synthetic control method. Journal of the American Statistical Association, 116(536):1789-1803.
Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., and Wager, S. (2021). Synthetic difference-in-differences. American Economic Review, 111(12):4088-4118.

The Synth Package in R: Implementing the Original Abadie Synthetic Control

1 What Problem Does This Tool Solve?

2 Installation and Setup

3 Data Preparation with dataprep()

4 Running the Optimisation with synth()`‍`

5 Summarising Results with synth.tab()`‍`

6 Plotting: Path and Gap Plots

7 Permutation Inference: Placebo Studies

8 Key Options and Pitfalls

8.1 Predictor Choice

8.2 Balanced Panel Requirement

8.3 Optimisation Failures

8.4 Comparison to Modern Extensions

9 Comparison to Alternatives

10 Conclusion

References

‍

Continue Reading

The causalml Package in Python: Uplift Modeling and CATE Meta-Learners

The gsynth Package in R: Generalized Synthetic Control with Interactive Fixed Effects

Recent Results: Immigration, Migration, and Labour Markets

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Article Title

The Synth Package in R: Implementing the Original Abadie Synthetic Control

1 What Problem Does This Tool Solve?

2 Installation and Setup

3 Data Preparation with dataprep()

4 Running the Optimisation with synth()‍

5 Summarising Results with synth.tab()‍

6 Plotting: Path and Gap Plots

7 Permutation Inference: Placebo Studies

8 Key Options and Pitfalls

8.1 Predictor Choice

8.2 Balanced Panel Requirement

8.3 Optimisation Failures

8.4 Comparison to Modern Extensions

9 Comparison to Alternatives

10 Conclusion

References

‍

Continue Reading

The causalml Package in Python: Uplift Modeling and CATE Meta-Learners

The gsynth Package in R: Generalized Synthetic Control with Interactive Fixed Effects

Recent Results: Immigration, Migration, and Labour Markets

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Stay current with causal inference

Article Title

4 Running the Optimisation with synth()`‍`

5 Summarising Results with synth.tab()`‍`