Difference-in-Differences: A Practical Guide for Clinical Researchers

📖 15 min read · Published Mar 27, 2026

Difference-in-Differences (DID) is one of the most elegant causal inference methods when you have a policy change, treatment rollout, or intervention that affects one group but not another. It's intuitive, visual, and widely accepted — but it has a critical assumption that, when violated, makes your results meaningless.

What Is Difference-in-Differences?

DID compares the change in outcomes over time between a treated group and a control group. The "difference" happens twice:

  1. First difference: Change in the treated group (post minus pre)
  2. Second difference: Change in the control group (post minus pre)

The DID estimate is the difference between these two differences. It answers: How much more did the treated group change compared to the control group?

Simple Formula:

DID = (Y_treated,post - Y_treated,pre) - (Y_control,post - Y_control,pre)

When to Use It

DID works best when:

  • A policy or intervention affects some units (hospitals, regions, patients) but not others
  • You have data from before and after the intervention
  • The control group provides a counterfactual trend (what would have happened without treatment)
  • Timing is clear and exogenous (treatment wasn't chosen based on outcomes)

Example: Medicaid Expansion

States that expanded Medicaid vs. states that didn't. Compare uninsured rates before and after 2014. DID = (expansion state change) - (non-expansion state change).

The Critical Assumption: Parallel Trends

DID relies on parallel trends: in the absence of treatment, the treated and control groups would have followed the same trajectory. This is not testable because we don't observe the counterfactual.

What we can test is whether trends were parallel before treatment. If they weren't, your DID estimate is biased.

Why Parallel Trends Matters

If the treated group was already trending up faster than the control group before treatment, the post-treatment difference might just be a continuation of that trend — not a causal effect.

Common Failure Modes

1. Pre-Treatment Trends Not Parallel

Plot your outcomes over multiple pre-treatment periods. If the lines diverge before treatment, DID is not appropriate.

Fix: Use regression adjustment, matching, or interactive fixed effects models to account for differential trends.

2. Compositional Changes

If the types of units in your sample change over time (e.g., healthier patients shift hospitals after a policy), DID conflates composition effects with treatment effects.

Fix: Use individual-level panel data and control for time-invariant characteristics with fixed effects.

3. Staggered Adoption Without Accounting for It

When treatment occurs at different times for different units, the standard two-way fixed effects (TWFE) estimator is biased because later-treated units serve as controls for earlier-treated units.

Fix: Use modern DID estimators like Callaway & Sant'Anna (2021), Sun & Abraham (2021), or Borusyak et al. (2024) that handle staggered timing properly.

4. Anticipation Effects

If treated units change behavior before the intervention (because they know it's coming), your pre-treatment period is contaminated.

Fix: Exclude periods immediately before treatment or model anticipation explicitly.

5. Time-Varying Confounding

If an unobserved factor affects both treatment and outcomes differently over time (e.g., economic shocks), DID fails even with parallel pre-trends.

Fix: Add time-varying controls, run sensitivity analyses, or use triple-differences if you have a second comparison dimension.

What to Report

A complete DID analysis should include:

  • Pre-treatment trend plot — show parallel trends (or lack thereof)
  • Event study plot — dynamic effects over time with confidence intervals
  • Balance table — compare treated vs control at baseline
  • Regression table — with and without covariates, cluster-robust SEs
  • Specification tests — placebo tests (fake treatment dates), exclude early/late adopters
  • Sensitivity to controls — how stable is the estimate?
  • Staggered adoption checks — if applicable, use modern estimators
  • Heterogeneity — does the effect vary by subgroup?

When Not to Use DID

  • Pre-treatment trends are not parallel and you can't control for differential trends
  • Treatment timing is endogenous (units choose when to be treated based on outcomes)
  • You only have one pre-period — you can't test trends with a single point
  • Treatment effects are immediate and there's no lag (DID needs time to "work")

Alternatives to Consider

MethodWhen to Use Instead
Interrupted Time Series (ITS)Single treated unit, long time series, want to model level & slope changes
Synthetic ControlFew treated units, many control units, want data-driven counterfactual
Triple-Differences (DDD)Two comparison dimensions, worried about time-varying confounding in DID
RDD with TimeSharp cutoff for treatment eligibility based on a running variable
Matched DIDTreated and control differ at baseline — match first, then DID

Key References

  • • Angrist JD, Pischke JS. Mostly Harmless Econometrics. Princeton University Press, 2009. (Chapter 5)
  • • Callaway B, Sant'Anna PHC. Difference-in-Differences with multiple time periods. J Econometrics 2021;225(2):200-230.
  • • Goodman-Bacon A. Difference-in-differences with variation in treatment timing. J Econometrics 2021;225(2):254-277.
  • • Roth J, Sant'Anna PHC, Bilinski A, Poe J. What's trending in difference-in-differences? A synthesis of the recent econometrics literature. J Econometrics 2023;235(2):2218-2244.
  • • Wing C, Simon K, Bello-Gomez RA. Designing difference in difference studies: best practices for public health policy research. Annu Rev Public Health 2018;39:453-469.

Get AI-Powered DID Critique

Upload your study design and Aqrab will check your parallel trends assumption, flag staggered adoption issues, and suggest specification tests.

Try Aqrab Free →