Toolbox

synthdid in R: Implementing Synthetic Difference-in-Differences

1 What Problem Does This Tool Solve?

Staggered difference-in-differences (DiD) has been under heavy scrutiny since the literature established that two-way fixed effects (TWFE) estimates can be contaminated when treatment effects are heterogeneous [Goodman-Bacon, 2021]. One increasingly popular alternative is the Synthetic Difference-in-Differences (SDiD) estimator of Arkhangelsky et al. [2021], which combines the best properties of DiD (double differencing for common trends) and synthetic control (reweighting units and time periods to improve pre-treatment fit).

The synthdid R package implements the SDiD estimator, provides placebo-based variance estimation, and produces publication-quality plots. This article walks through a complete workflow using the package's built-in California tobacco data.

2 Installation and Setup

The synthdid package is available on CRAN:

install.packages("synthdid")

library(synthdid)

The package depends on Matrix, ggplot2, and grDevices, all of which are standard CRAN packages. No external dependencies are required.

3 The SDiD Estimator

The SDiD estimator [Arkhangelsky et al., 2021] takes a panel dataset with N units and T time periods, where a subset N1 of units are treated in period T0 + 1 onwards. It solves for two sets of weights simultaneously:

  • Unit weights ω̂i: Control units are reweighted so that the weighted pre-treatment outcome path of the control group matches the pre-treatment path of the treated group. This is the synthetic control idea applied to unit reweighting.
  • Time weights λ̂t: Pre-treatment time periods are reweighted to match the distribution of post-treatment time periods in the control group. This is the DiD idea applied to time reweighting.

The SDiD estimate is:

$$\hat{\tau}_{\text{SDiD}} = \frac{1}{N_1(T - T_0)} \sum_{i \in \text{treated}} \sum_{t > T_0} \left( Y_{it} - \hat{Y}_{it}^{\text{SDiD}} \right) ,$$
(1)

where the counterfactual ŶitSDiD is constructed from the doubly-weighted control group. Arkhangelsky et al. [2021] show that SDiD has lower variance than both DiD and synthetic control in many settings and is consistent under a factor model for outcomes.

4 A Minimal Working Example

The package ships with california_prop99, a balanced panel of US state cigarette sales from 1970 to 2000, with California receiving the Proposition 99 tobacco tax in 1989. This is the canonical application from Abadie et al. [2010].

library(synthdid)

# Load the built-in California Proposition 99 dataset

data("california_prop99")

# The setup() function converts the data frame to the required matrix format

# Arguments: panel data frame, unit ID, time ID, outcome, and treatment indicator

setup <- panel.matrices(california_prop99)

# Fit the SDiD estimator

tau_hat <- synthdid_estimate(setup$Y, setup$N0, setup$T0)

# Print the point estimate and standard error

print(tau_hat)

# Output: synthdid estimate of -15.60 (se 9.68)

# Interpretation: Proposition 99 reduced cigarette sales by about 15.6 packs

# per capita per year (compared to the synthetic control)

The synthdid_estimate() function takes three arguments:

  • Y: a N × T matrix of outcomes (units as rows, time as columns).
  • N0: the number of control units (the last N1 rows of Y are treated units).
  • T0: the number of pre-treatment time periods (the first T0 columns precede treatment).

5 Variance Estimation

SDiD standard errors are computed via placebo permutations by default. The idea: reassign treatment to control units, compute the SDiD estimate for each placebo, and use the distribution of placebo estimates to gauge sampling variability. The package implements this as:

# Variance estimation via placebo permutations (default)

se_placebo <- sqrt(vcov(tau_hat, method = "placebo"))

cat("Placebo SE:", se_placebo, "\n")

# Alternative: bootstrap SE

se_boot <- sqrt(vcov(tau_hat, method = "bootstrap"))

cat("Bootstrap SE:", se_boot, "\n")

# 95% confidence interval

ci_lower <- tau_hat - 1.96 * se_placebo

ci_upper <- tau_hat + 1.96 * se_placebo

cat("95%CI: [", round(ci_lower, 2), ",", round(ci_upper, 2), "]\n")

Arkhangelsky et al. [2021] recommend the placebo method when the number of control units is large (at least 50 is ideal for stable permutation distributions). The bootstrap is preferred when control units are few.

6 Comparison: DiD, SC, and SDiD

The synthdid package makes it easy to compare SDiD to standard DiD and synthetic control on the same data:

# Estimate all three on the same dataset

tau_did <- did_estimate(setup$Y, setup$N0, setup$T0)

tau_sc <- sc_estimate(setup$Y, setup$N0, setup$T0)

tau_sdid <- synthdid_estimate(setup$Y, setup$N0, setup$T0)

# Collect and print

estimates <- list(DiD = tau_did, SC = tau_sc, SDiD = tau_sdid)

sapply(estimates, function(e) c(estimate = e, se = sqrt(vcov(e, method = "placebo"))))

Typical output shows that SDiD has a smaller standard error than either DiD or synthetic control alone because it uses both unit and time weighting to reduce residual variance.

7 Visualisation

The package produces event-study style plots showing the weighted pre-treatment parallel trends and the post-treatment divergence:

# Plot the synthetic DiD fit

synthdid_plot(tau_sdid, treated.name = "California", control.name = "Synthetic control", title = "California Prop99: SDiD Estimate")

The plot shows two lines: the treated unit (California) and the SDiD-weighted synthetic control. In the pre-treatment periods, the two lines overlap closely (reflecting good pre-treatment fit after reweighting); in the post-treatment period, the gap between them is the estimated treatment effect.

8 Key Options and Pitfalls

  • Balanced panel required. The synthdid_estimate() function requires a balanced rectangular panel (all units observed in all periods). Unbalanced panels must be balanced by imputation or restriction to the balanced subsample before use.
  • Staggered adoption. The base function handles a single treatment cohort. For staggered adoption, the staggered_synthdid() function from the staggered package can be combined with the synthdid framework. Alternatively, the user can apply the synthdid_estimate() function separately for each cohort and aggregate using cohort sizes.
  • Donor pool selection. As in the synthetic control method, the donor pool (control units) should consist of units that are plausibly unaffected by the treatment and are comparable to the treated unit in the pre-treatment period. Exclude units with very different pre-treatment trends.
  • Pre-treatment fit diagnostics. Always inspect the synthdid_plot() output to verify that unit and time weights produce good pre-treatment parallel trends. Poor pre-treatment fit indicates that the identifying assumptions are not supported by the data.

9 Comparison to Alternatives

Method Unit weights Time weights Variance
DiD (TWFE) No No Lowest (but biased in HTE)
Synthetic control Yes No Medium
SDiD Yes Yes Lower than SC; robust to HTE

SDiD is particularly advantageous when: (i) there are many pre-treatment periods for weight calibration; (ii) treatment effects are heterogeneous; (iii) the treated unit is an outlier among potential controls in the pre-treatment period.

    SDiD is particularly advantageous when: (i) there are many pre-treatment periods for weight calibration; (ii) treatment effects are heterogeneous; (iii) the treated unit is an outlier among potential controls in the pre-treatment peri

10 Conclusion

The synthdid package provides a streamlined interface for implementing Synthetic Difference-in-Differences in R. The estimator improves on both TWFE DiD and synthetic control by simultaneously reweighting units and time periods to improve pre-treatment fit, reducing variance relative to either method alone. For panels with a moderate number of units and clear treatment timing, SDiD is a strong default choice.

References

  1. Abadie, A., Diamond, A., and Hainmueller, J. (2010). Synthetic control methods for comparative case studies: estimating the effect of California's tobacco control program. Journal of the American Statistical Association, 105(490):493-505.
  2. Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., and Wager, S. (2021). Synthetic difference-in-differences. American Economic Review, 111(12):4088-4118.
  3. Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. Journal of Econometrics, 225(2):254-277.[cite: 13]

Continue Reading

Browse All Sections →
Home
This is some text inside of a div block.
This is some text inside of a div block.
This is some text inside of a div block.

Article Title