Introduction
The synthetic control method (Abadie et al., 2010) transformed the analysis of aggregate case studies by providing a principled algorithm for constructing counterfactuals from donor units. But it was designed for a narrow setting: one treated unit, many pre-treatment periods, and a sparse donor pool from which a weighted average mimics the treated unit's pre-treatment trajectory.
What happens when there are many treated units, few pre-treatment periods, or when no convex combination of donors fits the treated unit well? Athey et al. (2021) answer with matrix completion (MC)—a fundamentally different approach that treats the entire panel as a matrix with missing entries and fills in the counterfactuals using a nuclear norm penalty that enforces a low-rank (factor model) structure. The result is a method that subsumes DiD, synthetic control, and synthetic DiD as special cases, and outperforms each in a range of practically important settings.
1 The Counterfactual Panel Problem
Consider a balanced panel with N units observed over T periods. Let Yᵢₜᵒᵇˢ be the observed outcome and Yᵢₜ(0) the potential outcome under no treatment. For untreated observations (Wᵢₜ = 0), Yᵢₜᵒᵇˢ = Yᵢₜ(0). For treated observations (Wᵢₜ = 1), Yᵢₜ(0) is the missing counterfactual we wish to recover.
Stacking all potential outcomes into an N × T matrix L with Lᵢₜ = Yᵢₜ(0), the problem reduces to: given that we observe Lᵢₜ wherever Wᵢₜ = 0, estimate the full matrix L. The treatment effect for treated cell (i, t) is then τ̂ᵢₜ = Yᵢₜᵒᵇˢ − L̂ᵢₜ.
The key modelling assumption is that L has low rank: a small number r of latent factors uᵢ ∈ ℝʳ and vₜ ∈ ℝʳ drive the variation:
When r = 0, (1) is the two-way fixed effects DiD model. When r = 1 and the factor loadings are constrained to be non-negative and sum to one, it approaches synthetic control.
2 From Synthetic Control to Matrix Completion
To understand where MC fits, it helps to trace the lineage of panel counterfactual methods.
- Synthetic control (Abadie et al., 2010). Assigns non-negative weights wⱼ ≥ 0 to donor units such that L̂₁ₜ = Σⱼ₌₂ᴺ wⱼYⱼₜ minimises pre-treatment mean squared error. The weights are constrained to sum to one (convex hull), limiting extrapolation. Inference uses placebo permutation tests.
- Augmented synthetic control (Ben-Michael et al., 2021). Debiases the SC weights by adding an outcome model: L̂₁ₜᴬˢᶜᴹ = L̂₁ₜˢᶜ + m̂(1,t) − Σⱼ wⱼm̂(j,t), where m̂ is a regression estimate of the outcome model. Removes extrapolation bias when the pre-treatment fit is imperfect.
- Synthetic DiD (Arkhangelsky et al., 2021). Combines unit weights and time weights: τ̂ˢᴰⁱᴰ = Σᵢ ω̂ᵢ Σₜ λ̂ₜ (Yᵢₜ − L̂ᵢₜ). The time weights down-weight pre-treatment periods that differ structurally from post-treatment periods, improving robustness to time trends.
- Matrix completion (Athey et al., 2021). Minimises the penalised least squares objective:
where 𝒪 = {(i,t) : Wᵢₜ = 0} is the set of observed (control) cells, ‖L‖⁎ = Σₖ σₖ(L) is the nuclear norm (sum of singular values), and λ > 0 is a regularisation parameter. The nuclear norm is the convex relaxation of rank, making (2) computationally tractable via semidefinite programming or iterative soft-thresholding algorithms.
3 Why Nuclear Norm?
Candès and Recht (2009) showed that minimising the nuclear norm subject to agreement with observed entries is the tightest convex relaxation of rank minimisation. Under sufficient incoherence conditions on the factor structure and random missingness, the nuclear norm solution exactly recovers the true low-rank matrix from a fraction of its entries.
In the panel setting, the missingness is not random (it follows the treatment assignment pattern), so exact recovery results do not directly apply. Nevertheless, Athey et al. (2021) establish consistency of L̂ under a factor model with fixed rank r and the standard DiD-style identification assumption that treatment assignment is weakly exogenous.
The nuclear norm penalty performs automatic rank selection: the solution is exactly low-rank for λ large enough and approaches OLS (with unit and time effects) as λ → 0. Cross-validation over held-out control cells selects λ in practice.
4 Key Results and Comparisons
4.1 Special Cases
Table 1 shows how the main panel methods are nested within the matrix completion family.
4.2 Simulation Evidence
Athey et al. (2021) simulate panels under the factor model (1) with r = 3 latent factors. With N = 50 units and T = 40 periods, MC-NNM reduces RMSE by 30-50% relative to synthetic control and 15-25% relative to SDiD. The gains are largest when: (a) the number of latent factors exceeds one; (b) treatment occurs early (few pre-treatment periods); or (c) no individual donor unit tracks the treated unit's trajectory well.
Conversely, when a perfect synthetic control exists (all variation captured by one factorand donor weights are in the interior of the convex hull), SC and MC perform similarly.
4.3 Inference
Athey et al. (2021) develop a jackknife variance estimator for τ̂ = N₁⁻¹T₁⁻¹ Σ(i,t):Wᵢₜ₌₁ τ̂ᵢₜ, where N₁ and T₁ are the numbers of treated units and periods. Under standard regularity conditions, the estimator is asymptotically normal. Alternatively, permutation inference analogous to SC placebo tests can be applied by randomly reassigning treatment to control units.
5 Recent Extensions
- Latent similarities (Deaner et al. 2025). When the factor structure is non-parametric, Deaner et al. (2025) develop an estimator that infers treatment effects in large panels by uncovering latent similarities between units . Unlike MC-NNM, this approach allows the number of factors to grow and uses local similarity in pre-treatment outcome trajectories to weight donors.
- Generalised synthetic control (Xu, 2017). The gsynth package implements interactive fixed effects (IFE) models, a parametric version of the factor model (1). IFE fits the factor model by EM algorithm with a pre-specified rank r, rather than penalising rank. It accommodates multiple treated units and staggered adoption, but requires a prior choice of r.
6 Implementation
The MCPanel package for R implements MC-NNM as described in Athey et al. (2021) . The main function mcnnm_cv() accepts a matrix of outcomes Y, a binary matrix of treatment indicators W, and optionally covariate matrices for rows (units) and columns (periods). It returns the completed matrix L̂ and the cross-validated λ.
For comparison, synthdid (SDiD), Synth (original SC), and augsynth (ASCM) are all available on CRAN. A practical recommendation: when the data satisfies the convex hull condition, start with SC. When fit is poor, try MC-NNM or ASCM. When the panel is large and the factor model seems plausible, MC-NNM is the natural choice.
7 Conclusion
Matrix completion represents a genuine methodological advance for panel data counterfactuals. By framing the problem as low-rank matrix recovery, Athey et al. (2021) unify the leading approaches and provide a method that adapts automatically to the complexity of the factor structure . The nuclear norm penalty is theoretically well-grounded, computationally tractable, and requires no a priori choice of rank or donor weights. For practitioners working with panels where synthetic control pre-treatment fit is imperfect, MC-NNM should be in the standard toolkit.
References
- Abadie, A., Diamond, A., and Hainmueller, J. (2010). Synthetic control methods for comparative case studies: Estimating the effect of California's Tobacco Control Program. Journal of the American Statistical Association, 105(490):493-505 .
- Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., and Wager, S. (2021). Synthetic difference-in-differences. American Economic Review, 111(12):4088-4118 .
- Athey, S., Bayati, M., Doudchenko, N., Imbens, G., and Khosravi, K. (2021). Matrix completion methods for causal panel data models. Journal of the American Statistical Association, 116(536):1716-1730 .
- Bai, J. (2009). Panel data models with interactive fixed effects. Econometrica, 77(4):1229-1279.
- Ben-Michael, E., Feller, A., and Rothstein, J. (2021). The augmented synthetic control method. Journal of the American Statistical Association, 116(536):1789-1803.
- Candès, E. J. and Recht, B. (2009). Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9(6):717-772.
- Deaner, B., Hsiang, S., and Zeleneev, A. (2025). Inferring treatment effects in large panels by uncovering latent similarities. Working paper, revised March 2025 .
- Xu, Y. (2017). Generalized synthetic control method: Causal inference with interactive fixed effects models. Political Analysis, 25(1):57-76.