Causal Inference Series
Synthetic Control Methods: Building Counterfactuals When DID Fails
March 28, 2026 · 16 min read · By Coefficients Health Analytics
You have one hospital that adopted a new protocol. Thirty that didn't. You want to know what would have happened without the intervention — but parallel trends don't hold and there's no clean control. The synthetic control method builds that missing counterfactual from scratch.
In this guide:
1. What Is the Synthetic Control Method?
Developed by Abadie and Gardeazabal (2003) and formalized by Abadie, Diamond, and Hainmueller (2010), the synthetic control method (SCM) constructs a weighted combination of untreated units — the donor pool — that best approximates the treated unit's trajectory before the intervention. The gap between the treated unit and this synthetic twin after intervention is the estimated causal effect.
Think of it as creating a "synthetic twin" of your treated unit using pieces of real untreated units. If the twin tracks the treated unit closely before the intervention, the post-intervention gap is credibly causal.
The core equation:
Y₁ₜ(synthetic) = w₂·Y₂ₜ + w₃·Y₃ₜ + ... + wⱼ·Yⱼₜ
where weights w ≥ 0 and Σw = 1. The weights are chosen to minimize the pre-treatment prediction error for both the outcome and predictor variables.
The original application: estimating the economic impact of terrorism in the Basque Country. The synthetic Basque Country — built from other Spanish regions — tracked real GDP closely before terrorism escalated, then diverged dramatically after.
2. When to Use SCM
SCM excels in settings where:
- One treated unit (or very few) with aggregate-level data — a hospital, region, country, or health system
- A pool of similar untreated units with the same outcome measured over time
- A reasonably long pre-treatment period to demonstrate fit
- No anticipation effects — the intervention was not predicted and acted on early
- Parallel trends may not hold — this is SCM's advantage over DID
Don't use SCM when:
- The treated unit is completely unlike any donor — no combination can match it
- The pre-treatment period is too short (<5 time points is risky)
- All donors were affected by spillover from the treatment
- You have individual-level data and many treated units — use DID or matching instead
- The outcome is highly volatile with no discernible pattern
Clinical research examples: A hospital adopts a new sepsis protocol — compare its mortality trajectory to a synthetic control from other hospitals. A country implements a drug pricing reform — compare pharmaceutical spending to a synthetic version of countries that didn't. A health system deploys AI-assisted triage — compare wait times to a weighted combination of comparable systems.
3. How It Works — The Optimization Problem
SCM solves a nested optimization. The outer problem chooses predictor importance weights (V), and the inner problem finds unit weights (W) that minimize:
min_W ‖X₁ - X₀W‖ᵥ = √((X₁ - X₀W)' V (X₁ - X₀W))
X₁ = treated unit's pre-treatment characteristics (including lagged outcomes).
X₀ = donor matrix of pre-treatment characteristics.
V = diagonal matrix of predictor importance weights (chosen to minimize pre-treatment MSPE).
Key constraints: Weights are non-negative and sum to one. This prevents extrapolation — the synthetic control is always an interpolation within the convex hull of donors. This is both a feature (interpretability, no wild extrapolation) and a limitation (can't go beyond the range of donors).
Including lagged outcomes: Abadie, Diamond, and Hainmueller (2010) argue that matching on pre-treatment outcome levels implicitly controls for unobserved confounders — if the synthetic control tracks the treated unit closely across many pre-treatment periods, it's unlikely that unobserved factors differ substantially. This is a stronger argument with more pre-treatment periods.
4. Choosing the Donor Pool
The donor pool makes or breaks SCM. Bad donors → bad synthetic control → garbage estimate.
Donor pool rules:
- Same data-generating process: Donors should be driven by similar structural factors as the treated unit
- No treatment spillover: Donors must be unaffected by the intervention (SUTVA)
- No parallel interventions: Donors shouldn't have adopted similar treatments during the study period
- Enough donors: Typically 10-40 units work well; too few limits matching, too many adds noise
- Pre-screen on substantive grounds: Don't include irrelevant units just because data exists
The overfitting trap: With many donors and a flexible V matrix, SCM can achieve near-perfect pre-treatment fit through idiosyncratic noise rather than structural similarity. If your fit looks too good (RMSPE ≈ 0) with bizarre donor weights, something is wrong.
Reporting weights: Always show which donors received non-trivial weights. If one donor dominates (weight > 0.5), you're essentially doing a comparative case study with that donor — which may be fine, but should be explicit.
5. Validation: Placebo Tests and Inference
SCM doesn't produce standard errors in the classical sense. Inference relies on permutation — specifically, in-space placebo tests.
In-Space Placebos
Apply SCM to every donor unit as if it were treated. If the treated unit's gap is unusually large compared to placebo gaps, the effect is credible. The resulting "spaghetti plot" of treatment-placebo gaps is the standard SCM visualization.
The p-value analog: The fraction of placebos with a post-treatment RMSPE/pre-treatment RMSPE ratio as large as the treated unit's. With 20 donors and the treated unit ranking #1, that's a p-value of 1/21 ≈ 0.048.
In-Time Placebos
Backdate the intervention to a period where no treatment occurred. If you find a "gap" even before the real treatment, your model has problems — either the synthetic control doesn't fit well or there's a confounding shock.
Leave-One-Out
Iteratively remove each donor with a non-zero weight and re-estimate. If the result is sensitive to dropping a single donor, the estimate is fragile. This is especially important when one donor dominates the weights.
Minimum validation battery:
- Pre-treatment fit plot (treated vs. synthetic)
- Gap plot (treated minus synthetic over time)
- In-space placebo test with spaghetti plot
- RMSPE ratio ranking
- Leave-one-out robustness
6. Modern Extensions
Augmented Synthetic Control (Ben-Michael et al., 2021)
Combines SCM with an outcome model (ridge regression) to correct for residual bias when pre-treatment fit is imperfect. The augmented SCM (ASCM) relaxes the requirement for near-perfect pre-treatment fit while maintaining the transparency of SCM. Use this when your vanilla SCM doesn't fit well but the donor pool is otherwise reasonable.
Synthetic Difference-in-Differences (Arkhangelsky et al., 2021)
SDID combines the reweighting logic of SCM with the double-differencing of DID. It reweights both units (like SCM) and time periods, and adds a fixed effect. The result: a method that's valid under weaker assumptions than either SCM or DID alone, with standard errors that are asymptotically normal. This is increasingly the default for applied researchers.
Penalized SCM (Abadie & L'Hour, 2021)
Adds a penalty for pairwise discrepancies between the treated unit and donors, reducing the influence of dissimilar donors. Useful when the donor pool contains units that are structurally different from the treated unit — the penalty downweights them automatically.
Generalized SCM (Xu, 2017)
Extends SCM to settings with multiple treated units using interactive fixed effects (factor models). Implemented in the gsynth R package. Bridges the gap between SCM (one treated unit) and DID (many treated units, parallel trends).
Software:
- R:
Synth(original),augsynth(ASCM + SDID),gsynth(generalized),synthdid(SDID) - Stata:
synth,synth_runner(batch placebos),sdid - Python:
SparseSC,SyntheticControlMethods
7. Common Pitfalls
Poor pre-treatment fit → invalid estimates
If the synthetic control can't track the treated unit before the intervention, it can't credibly estimate what would have happened after. Report pre-treatment RMSPE. If it's large relative to the outcome scale, consider ASCM or a different method.
Interpolation bias
If the treated unit lies outside the convex hull of donors on key predictors, SCM will fit poorly because it can only interpolate. No combination of units that are all smaller/poorer/different can match a treated unit that's fundamentally different.
Overfitting to pre-treatment noise
With many donors and few pre-treatment periods, SCM can achieve perfect fit by exploiting noise. The result: good pre-treatment match, terrible post-treatment prediction. More pre-treatment periods relative to donors reduces this risk.
SUTVA violations — spillover from treatment
If the treatment affects donor units indirectly (e.g., patients migrating to untreated hospitals, economic spillover to neighboring regions), the synthetic control is contaminated. The estimated effect is biased toward zero or in an unpredictable direction.
Cherry-picking the intervention date
The intervention date must be fixed a priori. If you search for the "break" in the data and call it the treatment, you're guaranteed to find an effect. Pre-register the intervention date or justify it from administrative records.
8. Reporting Checklist
Include in your paper:
- ☐ Donor pool justification: Why these units? What was excluded and why?
- ☐ Predictor variables: What covariates entered the matching (beyond lagged outcomes)?
- ☐ Pre-treatment fit: Plot + RMSPE + comparison table (treated vs. synthetic vs. sample average)
- ☐ Donor weights: Table of all non-zero weights with unit identifiers
- ☐ Gap plot: Treated minus synthetic over the full timeline
- ☐ In-space placebo tests: Spaghetti plot with RMSPE ratio ranking
- ☐ In-time placebo: Backdated intervention test
- ☐ Leave-one-out: Sensitivity to individual donor removal
- ☐ Effect magnitude: Point estimate + percentage change + contextualization
- ☐ Limitations: Spillover concerns, donor pool constraints, generalizability
9. When to Use Alternatives
| Scenario | Better Method | Why |
|---|---|---|
| Many treated units, parallel trends hold | DID | More efficient, well-understood inference |
| Many treated units, parallel trends uncertain | SDID | Combines SCM reweighting with DID, standard errors available |
| Individual-level data, rich covariates | PSM / IPW | SCM is designed for aggregate-level data |
| Sharp threshold determines treatment | RDD | Stronger identification near the cutoff |
| Single treated unit, interrupted time series | ITS | Simpler, works with one unit if pre-treatment trend is stable |
| Multiple treated units, factor model | GSCM (Xu 2017) | Handles multiple treated units with interactive fixed effects |
Key References
- Abadie A, Gardeazabal J. The economic costs of conflict: a case study of the Basque Country. Am Econ Rev. 2003;93(1):113-132.
- Abadie A, Diamond A, Hainmueller J. Synthetic control methods for comparative case studies. J Am Stat Assoc. 2010;105(490):493-505.
- Abadie A. Using synthetic controls: feasibility, data requirements, and methodological aspects. J Econ Lit. 2021;59(2):391-425.
- Ben-Michael E, Feller A, Rothstein J. The augmented synthetic control method. J Am Stat Assoc. 2021;116(536):1789-1803.
- Arkhangelsky D, et al. Synthetic difference-in-differences. Am Econ Rev. 2021;111(12):4088-4118.
- Xu Y. Generalized synthetic control method. Polit Anal. 2017;25(1):57-76.
- Abadie A, L'Hour J. A penalized synthetic control estimator for disaggregated data. J Am Stat Assoc. 2021;116(536):1817-1834.
Check your study design automatically
Aqrab's AI critique engine flags methodological weaknesses in seconds — including synthetic control design choices.