← Back to Blog

Causal Inference Series

Synthetic Control Methods: Building Counterfactuals When DID Fails

March 28, 2026 · 16 min read · By Coefficients Health Analytics

You have one hospital that adopted a new protocol. Thirty that didn't. You want to know what would have happened without the intervention — but parallel trends don't hold and there's no clean control. The synthetic control method builds that missing counterfactual from scratch.

1. What Is the Synthetic Control Method?

Developed by Abadie and Gardeazabal (2003) and formalized by Abadie, Diamond, and Hainmueller (2010), the synthetic control method (SCM) constructs a weighted combination of untreated units — the donor pool — that best approximates the treated unit's trajectory before the intervention. The gap between the treated unit and this synthetic twin after intervention is the estimated causal effect.

Think of it as creating a "synthetic twin" of your treated unit using pieces of real untreated units. If the twin tracks the treated unit closely before the intervention, the post-intervention gap is credibly causal.

The core equation:

Y₁ₜ(synthetic) = w₂·Y₂ₜ + w₃·Y₃ₜ + ... + wⱼ·Yⱼₜ

where weights w ≥ 0 and Σw = 1. The weights are chosen to minimize the pre-treatment prediction error for both the outcome and predictor variables.

The original application: estimating the economic impact of terrorism in the Basque Country. The synthetic Basque Country — built from other Spanish regions — tracked real GDP closely before terrorism escalated, then diverged dramatically after.

2. When to Use SCM

SCM excels in settings where:

  • One treated unit (or very few) with aggregate-level data — a hospital, region, country, or health system
  • A pool of similar untreated units with the same outcome measured over time
  • A reasonably long pre-treatment period to demonstrate fit
  • No anticipation effects — the intervention was not predicted and acted on early
  • Parallel trends may not hold — this is SCM's advantage over DID

Don't use SCM when:

  • The treated unit is completely unlike any donor — no combination can match it
  • The pre-treatment period is too short (<5 time points is risky)
  • All donors were affected by spillover from the treatment
  • You have individual-level data and many treated units — use DID or matching instead
  • The outcome is highly volatile with no discernible pattern

Clinical research examples: A hospital adopts a new sepsis protocol — compare its mortality trajectory to a synthetic control from other hospitals. A country implements a drug pricing reform — compare pharmaceutical spending to a synthetic version of countries that didn't. A health system deploys AI-assisted triage — compare wait times to a weighted combination of comparable systems.

3. How It Works — The Optimization Problem

SCM solves a nested optimization. The outer problem chooses predictor importance weights (V), and the inner problem finds unit weights (W) that minimize:

min_W ‖X₁ - X₀W‖ᵥ = √((X₁ - X₀W)' V (X₁ - X₀W))

X₁ = treated unit's pre-treatment characteristics (including lagged outcomes).
X₀ = donor matrix of pre-treatment characteristics.
V = diagonal matrix of predictor importance weights (chosen to minimize pre-treatment MSPE).

Key constraints: Weights are non-negative and sum to one. This prevents extrapolation — the synthetic control is always an interpolation within the convex hull of donors. This is both a feature (interpretability, no wild extrapolation) and a limitation (can't go beyond the range of donors).

Including lagged outcomes: Abadie, Diamond, and Hainmueller (2010) argue that matching on pre-treatment outcome levels implicitly controls for unobserved confounders — if the synthetic control tracks the treated unit closely across many pre-treatment periods, it's unlikely that unobserved factors differ substantially. This is a stronger argument with more pre-treatment periods.

4. Choosing the Donor Pool

The donor pool makes or breaks SCM. Bad donors → bad synthetic control → garbage estimate.

Donor pool rules:

  • Same data-generating process: Donors should be driven by similar structural factors as the treated unit
  • No treatment spillover: Donors must be unaffected by the intervention (SUTVA)
  • No parallel interventions: Donors shouldn't have adopted similar treatments during the study period
  • Enough donors: Typically 10-40 units work well; too few limits matching, too many adds noise
  • Pre-screen on substantive grounds: Don't include irrelevant units just because data exists

The overfitting trap: With many donors and a flexible V matrix, SCM can achieve near-perfect pre-treatment fit through idiosyncratic noise rather than structural similarity. If your fit looks too good (RMSPE ≈ 0) with bizarre donor weights, something is wrong.

Reporting weights: Always show which donors received non-trivial weights. If one donor dominates (weight > 0.5), you're essentially doing a comparative case study with that donor — which may be fine, but should be explicit.

5. Validation: Placebo Tests and Inference

SCM doesn't produce standard errors in the classical sense. Inference relies on permutation — specifically, in-space placebo tests.

In-Space Placebos

Apply SCM to every donor unit as if it were treated. If the treated unit's gap is unusually large compared to placebo gaps, the effect is credible. The resulting "spaghetti plot" of treatment-placebo gaps is the standard SCM visualization.

The p-value analog: The fraction of placebos with a post-treatment RMSPE/pre-treatment RMSPE ratio as large as the treated unit's. With 20 donors and the treated unit ranking #1, that's a p-value of 1/21 ≈ 0.048.

In-Time Placebos

Backdate the intervention to a period where no treatment occurred. If you find a "gap" even before the real treatment, your model has problems — either the synthetic control doesn't fit well or there's a confounding shock.

Leave-One-Out

Iteratively remove each donor with a non-zero weight and re-estimate. If the result is sensitive to dropping a single donor, the estimate is fragile. This is especially important when one donor dominates the weights.

Minimum validation battery:

  1. Pre-treatment fit plot (treated vs. synthetic)
  2. Gap plot (treated minus synthetic over time)
  3. In-space placebo test with spaghetti plot
  4. RMSPE ratio ranking
  5. Leave-one-out robustness

6. Modern Extensions

Augmented Synthetic Control (Ben-Michael et al., 2021)

Combines SCM with an outcome model (ridge regression) to correct for residual bias when pre-treatment fit is imperfect. The augmented SCM (ASCM) relaxes the requirement for near-perfect pre-treatment fit while maintaining the transparency of SCM. Use this when your vanilla SCM doesn't fit well but the donor pool is otherwise reasonable.

Synthetic Difference-in-Differences (Arkhangelsky et al., 2021)

SDID combines the reweighting logic of SCM with the double-differencing of DID. It reweights both units (like SCM) and time periods, and adds a fixed effect. The result: a method that's valid under weaker assumptions than either SCM or DID alone, with standard errors that are asymptotically normal. This is increasingly the default for applied researchers.

Penalized SCM (Abadie & L'Hour, 2021)

Adds a penalty for pairwise discrepancies between the treated unit and donors, reducing the influence of dissimilar donors. Useful when the donor pool contains units that are structurally different from the treated unit — the penalty downweights them automatically.

Generalized SCM (Xu, 2017)

Extends SCM to settings with multiple treated units using interactive fixed effects (factor models). Implemented in the gsynth R package. Bridges the gap between SCM (one treated unit) and DID (many treated units, parallel trends).

Software:

  • R: Synth (original), augsynth (ASCM + SDID), gsynth (generalized), synthdid (SDID)
  • Stata: synth, synth_runner (batch placebos), sdid
  • Python: SparseSC, SyntheticControlMethods

7. Common Pitfalls

Poor pre-treatment fit → invalid estimates

If the synthetic control can't track the treated unit before the intervention, it can't credibly estimate what would have happened after. Report pre-treatment RMSPE. If it's large relative to the outcome scale, consider ASCM or a different method.

Interpolation bias

If the treated unit lies outside the convex hull of donors on key predictors, SCM will fit poorly because it can only interpolate. No combination of units that are all smaller/poorer/different can match a treated unit that's fundamentally different.

Overfitting to pre-treatment noise

With many donors and few pre-treatment periods, SCM can achieve perfect fit by exploiting noise. The result: good pre-treatment match, terrible post-treatment prediction. More pre-treatment periods relative to donors reduces this risk.

SUTVA violations — spillover from treatment

If the treatment affects donor units indirectly (e.g., patients migrating to untreated hospitals, economic spillover to neighboring regions), the synthetic control is contaminated. The estimated effect is biased toward zero or in an unpredictable direction.

Cherry-picking the intervention date

The intervention date must be fixed a priori. If you search for the "break" in the data and call it the treatment, you're guaranteed to find an effect. Pre-register the intervention date or justify it from administrative records.

8. Reporting Checklist

Include in your paper:

  • Donor pool justification: Why these units? What was excluded and why?
  • Predictor variables: What covariates entered the matching (beyond lagged outcomes)?
  • Pre-treatment fit: Plot + RMSPE + comparison table (treated vs. synthetic vs. sample average)
  • Donor weights: Table of all non-zero weights with unit identifiers
  • Gap plot: Treated minus synthetic over the full timeline
  • In-space placebo tests: Spaghetti plot with RMSPE ratio ranking
  • In-time placebo: Backdated intervention test
  • Leave-one-out: Sensitivity to individual donor removal
  • Effect magnitude: Point estimate + percentage change + contextualization
  • Limitations: Spillover concerns, donor pool constraints, generalizability

9. When to Use Alternatives

ScenarioBetter MethodWhy
Many treated units, parallel trends holdDIDMore efficient, well-understood inference
Many treated units, parallel trends uncertainSDIDCombines SCM reweighting with DID, standard errors available
Individual-level data, rich covariatesPSM / IPWSCM is designed for aggregate-level data
Sharp threshold determines treatmentRDDStronger identification near the cutoff
Single treated unit, interrupted time seriesITSSimpler, works with one unit if pre-treatment trend is stable
Multiple treated units, factor modelGSCM (Xu 2017)Handles multiple treated units with interactive fixed effects

Key References

  • Abadie A, Gardeazabal J. The economic costs of conflict: a case study of the Basque Country. Am Econ Rev. 2003;93(1):113-132.
  • Abadie A, Diamond A, Hainmueller J. Synthetic control methods for comparative case studies. J Am Stat Assoc. 2010;105(490):493-505.
  • Abadie A. Using synthetic controls: feasibility, data requirements, and methodological aspects. J Econ Lit. 2021;59(2):391-425.
  • Ben-Michael E, Feller A, Rothstein J. The augmented synthetic control method. J Am Stat Assoc. 2021;116(536):1789-1803.
  • Arkhangelsky D, et al. Synthetic difference-in-differences. Am Econ Rev. 2021;111(12):4088-4118.
  • Xu Y. Generalized synthetic control method. Polit Anal. 2017;25(1):57-76.
  • Abadie A, L'Hour J. A penalized synthetic control estimator for disaggregated data. J Am Stat Assoc. 2021;116(536):1817-1834.

Check your study design automatically

Aqrab's AI critique engine flags methodological weaknesses in seconds — including synthetic control design choices.