Marginal Structural Models: A Practical Guide for Clinical Researchers

📖 17 min read · Published Apr 2, 2026

Standard regression adjusts for confounders. Propensity scores balance them. But what happens when your confounder is itself affected by treatment? When a patient's CD4 count at month 6 depends on whether they received antiretrovirals at month 0, adjusting for it will block the very pathway you want to study. Marginal Structural Models solve this by weighting — not adjusting — through time-varying confounders affected by prior treatment.

The Problem: Time-Varying Confounders Affected by Treatment

In longitudinal studies, patients receive treatment over time, and their characteristics change. Those changes are often confounders for subsequent treatment decisions. But they're also affected by prior treatment. This creates a dilemma:

  • If you adjust for the time-varying confounder in standard regression, you block part of the causal effect (mediation bias)
  • If you don't adjust for it, residual confounding remains
  • If you try stratification, you create artificial selection bias through collider stratification

This isn't a rare edge case. It's the standard situation in chronic disease management, HIV treatment, oncology, and any setting where treatment decisions are made sequentially based on evolving patient status.

The Core Dilemma (Illustrated):

A patient starts antiretroviral therapy (ART) at baseline. Their CD4 count rises. At month 6, their CD4 count (a confounder for mortality — lower CD4 means higher risk) determines whether they continue or switch ART. Adjusting for month-6 CD4 blocks the effect of baseline ART. Not adjusting leaves confounding. Traditional methods have no answer. MSMs do.

The key insight of MSMs, formalized by Robins, Hernán, and Brumback in 2000, is to restructure the data using weights so that, at each time point, treatment assignment becomes independent of past covariate history — mimicking sequential randomization.

How MSMs Work: The Intuition

MSMs don't change the model. They change the data. By reweighting each patient at each time point, MSMs create a pseudo-population where:

  1. At each time point, treatment assignment is independent of prior covariate history
  2. The relationship between treatment and outcome can be estimated without bias from time-varying confounders
  3. The "marginal" effect represents the causal effect of the treatment regime, not the effect in a particular subgroup

The weights are calculated from the probability of each patient receiving their actual treatment at each time point, conditional on their prior treatment and covariate history. These are called inverse probability of treatment weights (IPTW).

The Weight Formula

For a patient with treatment history A₀, A₁, ..., Aₖ and covariate history L₀, L₁, ..., Lₖ, the weight at time k is the product of the inverse probabilities of receiving each observed treatment given past treatment and covariates:

Wᵢ = ∏ₖ P(Aₖ = aₖ | Āₖ₋₁ = āₖ₋₁, L̄ₖ = l̄ₖ)⁻¹

The inverse probability weight for patient i is the product over time of the inverse of the probability of receiving their actual treatment at each time, given their past treatment and covariates.

This weight has an important interpretation: patients who received treatment in ways that were unexpected given their history get higher weights. Patients who received treatment in expected ways get weights close to 1. In the weighted pseudo-population, the time-varying confounders are no longer associated with treatment assignment.

Stabilized Weights: Why You Need Them

The raw weights described above work, but they have a serious problem: they can be extremely large for patients with very low or very high treatment probabilities. This inflates variance and makes estimates unstable.

The solution is stabilized weights, which multiply the inverse probability by a numerator that captures the marginal probability of the observed treatment:

SWᵢ = ∏ₖ P(Aₖ = aₖ | Āₖ₋₁ = āₖ₋₁) / P(Aₖ = aₖ | Āₖ₋₁ = āₖ₋₁, L̄ₖ = l̄ₖ)

Numerator: probability of observed treatment given only prior treatment (not covariates). Denominator: probability given prior treatment AND covariates.

Stabilized weights have two important properties:

  • Mean close to 1: Unlike raw weights which average to the sample size, stabilized weights should average near 1. If the mean is far from 1, something is wrong.
  • Reduced variance: By multiplying by the marginal probability in the numerator, extreme weights are compressed without introducing bias.

Practical Check:

Always check the distribution of your stabilized weights. The mean should be approximately 1. The maximum should not exceed 10–20 for binary treatment. If you see weights above 50, consider truncation — but document it and run sensitivity analyses at different truncation levels.

Clinical Example: ART and Mortality in HIV

This is the canonical MSM application, and for good reason — it's where the method was born. Consider estimating the causal effect of cumulative antiretroviral therapy (cART) on mortality in HIV patients, using observational cohort data.

The Setup

At each clinic visit (quarterly), a physician decides whether to initiate or modify cART based on the patient's current CD4 count and viral load. These decisions evolve over time:

  • CD4 count at time t is a confounder for mortality at time t+1
  • CD4 count at time t is also affected by cART at time t-1
  • Standard regression adjusting for CD4 blocks the causal effect of cART

The MSM Solution

Step-by-step:

  1. Model treatment assignment: Fit a logistic regression for cART initiation at each time point, conditional on prior cART history, current CD4, viral load, age, and clinical site. These produce the denominator probabilities.
  2. Model the numerator: Fit a simpler logistic regression for cART initiation conditional only on prior cART history (no covariates). These produce the numerator probabilities.
  3. Compute stabilized weights: For each patient at each time point, SW = numerator / denominator. Multiply weights over time to get the cumulative stabilized weight.
  4. Fit the MSM: Fit a weighted Cox model or pooled logistic regression with the stabilized weights. The model relates cumulative cART exposure to mortality — without adjusting for CD4 (which would block the effect).

What the MSM Estimates

The MSM estimates the counterfactual mortality under a fixed treatment strategy (e.g., "initiate cART at baseline and continue indefinitely") versus "never initiate cART." This is a causal contrast between two hypothetical treatment regimens — not just an association between observed treatment and outcome.

Clinical Example: Chemotherapy Timing in Oncology

A second example that illustrates MSMs in a different context: does early initiation of adjuvant chemotherapy after surgery improve overall survival in stage II colon cancer?

The confounding structure: patients with postoperative complications (a confounder for survival) are less likely to receive chemotherapy on time. But postoperative complications may also be related to tumor characteristics that influenced surgical approach. Standard regression adjusting for postoperative complications would be biased.

MSM approach: at each post-operative week, model the probability of chemotherapy initiation given prior complications, performance status, and tumor stage. Weigh patients to create a pseudo-population where chemotherapy timing is independent of these factors. The MSM estimates the survival under "initiate chemo by week 8" versus "initiate chemo by week 16" — two hypothetical regimens.

Model Fitting: Weighted Regression

Once weights are computed, you fit a marginal structural model using weighted regression. The model form depends on your outcome:

Outcome TypeModelSoftware
Time-to-eventWeighted Cox / IPW-KMsurvival (R), lifelines (Python)
BinaryWeighted logistic regressionsurvey (R), statsmodels (Python)
ContinuousWeighted least squaressurvey (R), statsmodels (Python)
CountWeighted Poisson / NBMASS (R), statsmodels (Python)

Critical implementation note: when using weighted Cox models, use robust variance estimators (sandwich estimators) or robust standard errors. The weights violate the independence assumption that standard Cox standard errors rely on. In R, the survey package handles this automatically. In Python, statsmodels with cov_type='HC0' works for weighted GLMs.

The Causal Assumptions

MSMs require three key assumptions, and violating any one of them invalidates the causal interpretation:

1. Sequential Exchangeability

At each time point, treatment assignment is independent of future potential outcomes, given past treatment and covariate history. This is the MSM equivalent of "no unmeasured confounding" — but it applies sequentially, at every time point. No single DAG can represent this; it requires a series of DAGs or a single DAG with repeated structural equations.

2. Positivity

For every combination of past treatment and covariates, there must be a positive probability of receiving each treatment option. In practice, this means no deterministic treatment rules. If every patient with CD4 < 200 starts cART, there's a positivity violation at that threshold.

3. Correct Model Specification

The treatment model (and numerator model for stabilized weights) must be correctly specified. Misspecification of the treatment model leads to biased weights and biased estimates. Including irrelevant covariates is safe; omitting relevant confounders is not.

Common Pitfalls

❌ Including post-treatment variables in the treatment model

Only include covariates measured before each treatment decision. Including a hospitalization that happened because of treatment contaminates the weights and introduces bias. When in doubt, draw a DAG for each time point.

❌ Ignoring stabilized weights

Unstabilized weights will give you the right point estimate but with unnecessarily wide confidence intervals. In small samples or rare treatments, unstabilized weights can make results completely uninterpretable due to extreme values. Always use stabilized weights for MSMs.

❌ Not truncating extreme weights

A single patient with weight 500 can dominate your analysis. Common truncation rules: truncate at the 1st and 99th percentiles, or at 10. Always report the truncation level and run sensitivity analyses at multiple levels. If results change substantially with truncation, your estimate is not robust.

❌ Confusing MSM with standard weighted regression

MSMs use time-varying weights that change at each time point. A single cross-sectional weight (like a propensity score weight) is not an MSM — it's just IPW. The "structural" in MSM refers to the model for the counterfactual outcomes under specific treatment regimes.

❌ Fitting MSMs on small samples

MSMs rely on weight stability. With fewer than 100–200 patients per treatment-time combination, weights become unstable and estimates unreliable. If your stratified treatment table shows cells with <10 patients, consider collapsing categories or using a parametric model with fewer terms.

Implementation Checklist

1

Define the temporal structure

Specify time intervals (e.g., monthly, quarterly). At each interval, identify: what treatment is being measured, what covariates are measured, and the causal ordering within each interval.

2

Build the treatment model

For each time point, fit a logistic regression predicting treatment from prior treatment and covariate history. Include all confounders — measured ones. Consider machine learning for the treatment model if covariate space is high-dimensional.

3

Build the numerator model

For stabilized weights, fit a second treatment model using only prior treatment history (no covariates). This provides the marginal probability component.

4

Compute and diagnose weights

Calculate stabilized weights at each time point. Multiply cumulatively. Check: mean near 1, range not extreme, distribution reasonable. Apply truncation if needed.

5

Fit the MSM

Fit a weighted regression model relating treatment history to the outcome. Use robust variance estimators. The model form defines the structural relationship — choose it based on the scientific question, not model fit statistics.

6

Sensitivity analysis

Vary truncation levels, include/exclude borderline patients, test alternative model forms. For unmeasured confounding, use E-values (see our E-values guide).

Reporting Checklist

Before submitting your manuscript, verify:

  • Clear description of the temporal structure — how treatment and covariates are measured over time
  • DAG or causal diagram showing the time-varying confounding structure
  • Treatment model specification — which covariates at which time points, and why
  • Confirmation that stabilized weights were used, with mean and range reported
  • Weight truncation procedure described (level and rationale)
  • Robust variance estimation used (not standard errors from weighted regression)
  • Positivity assessment — evidence that all treatment-covariate combinations are possible
  • Sensitivity analyses: varying truncation, alternative model forms, unmeasured confounding assessment
  • Comparison with conventional methods (standard regression, unweighted model) to show the impact of time-varying confounding

MSMs vs. Other Causal Methods

MethodHandles Time-Varying Confounders?Handles Unmeasured Confounders?Data Requirements
Standard RegressionNo — blocks mediation pathwayNoCross-sectional or longitudinal
PSM / Standard IPWNo — single time point onlyNoBaseline data
G-computationYes — parametric modelingNoLongitudinal
MSMYes — weighting approachNoLongitudinal, large samples
Structural Nested ModelsYesNoLongitudinal
G-estimationYesPartially (instrument-based)Longitudinal
DML / Causal ForestsWith extensionsNoLarge, high-dimensional

MSMs are the most widely used method for time-varying confounding. G-computation is an alternative that uses modeling instead of weighting; results should agree when both are correctly specified.

Software Implementation

R

# Load packages
library(survey)
library(survival)

# Step 1: Fit treatment models at each time point
# Denominator model (treatment ~ prior treatment + confounders)
denom_fit <- glm(
  treat_t ~ lag_treat + cd4 + viral_load + age,
  family = binomial,
  data = long_data
)
denom_prob <- predict(denom_fit, type = "response")

# Numerator model (treatment ~ prior treatment only)
num_fit <- glm(
  treat_t ~ lag_treat,
  family = binomial,
  data = long_data
)
num_prob <- predict(num_fit, type = "response")

# Step 2: Compute stabilized weights
long_data$sw <- ifelse(
  long_data$treat_t == 1,
  num_prob / denom_prob,
  (1 - num_prob) / (1 - denom_prob)
)

# Step 3: Cumulative weight (product over time)
long_data <- long_data %>%
  group_by(id) %>%
  mutate(cum_sw = cumprod(sw))

# Step 4: Truncate
long_data$sw_trunc <- pmin(long_data$cum_sw, 10)

# Step 5: Fit weighted Cox model
design <- svydesign(
  ids = ~1,
  weights = ~sw_trunc,
  data = long_data
)
svycoxph(Surv(time, death) ~ treat_cum + age, data = long_data,
         design = design)

Python

import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from lifelines import CoxPHFitter

# Step 1: Fit treatment models
def fit_treatment_model(data, covariates, treat_col):
    """Fit denominator (full) and numerator (marginal) models."""
    X_denom = data[covariates].values
    y = data[treat_col].values
    denom = LogisticRegression(max_iter=1000)
    denom.fit(X_denom, y)
    
    X_num = data[['lag_treat']].values
    num = LogisticRegression(max_iter=1000)
    num.fit(X_num, y)
    
    denom_prob = denom.predict_proba(X_denom)[:, 1]
    num_prob = num.predict_proba(X_num)[:, 1]
    return denom_prob, num_prob

# Step 2: Compute stabilized weights
denom_prob, num_prob = fit_treatment_model(
    data, ['lag_treat', 'cd4', 'viral_load', 'age'], 'treat_t'
)
sw = np.where(data['treat_t'] == 1,
              num_prob / denom_prob,
              (1 - num_prob) / (1 - denom_prob))

# Step 3: Cumulative weight
data['sw'] = sw
data['cum_sw'] = data.groupby('id')['sw'].cumprod()

# Step 4: Truncate
data['sw_trunc'] = data['cum_sw'].clip(upper=10)

# Step 5: Fit weighted Cox model
cph = CoxPHFitter()
cph.fit(data[['time', 'death', 'treat_cum', 'age', 'sw_trunc']],
        duration_col='time', event_col='death',
        weights_col='sw_trunc')
cph.print_summary()

When to Use MSMs — and When Not To

Ideal Settings

  • Longitudinal data with sequential treatment decisions — HIV cohorts, oncology follow-up, chronic disease management, ICU care
  • Time-varying confounders affected by prior treatment — this is the specific problem MSMs were designed to solve
  • Large sample sizes — MSMs need stable weight estimates; >500 patients per treatment-time combination is ideal
  • Well-defined treatment initiation points — clear temporal ordering of covariate measurement and treatment decisions

Poor Settings

  • Small samples (<100) — weights will be unstable; consider parametric G-computation or stratification instead
  • No time-varying confounders — if confounders are only at baseline, standard IPW or PSM is simpler and equivalent
  • Positivity violations — if treatment is deterministic for some covariate values, MSMs break down
  • Time-varying treatments without clear intervals — MSMs require discrete time points; continuous treatment requires specialized methods

Key References

  • Robins JM, Hernán MA, Brumback B. Marginal Structural Models and Causal Inference in Epidemiology. Epidemiology. 2000;11(5):550-560.
  • Hernán MA, Brumback B, Robins JM. Marginal Structural Models to Estimate the Joint Causal Effect of Nonrandomized Treatments. JASA. 2001;96(454):440-448.
  • Suarez D, Borras R, Basagana X. Comparison of marginal structural cox models for the analysis of a time-varying exposure: a simulation study. Emerg Themes Epidemiol. 2007;4:14.
  • Cain LE, Robins JM, Lanoy E, et al. When to initiate combination antiretroviral therapy to reduce mortality and AIDS-defining illness in HIV-infected persons in developed countries: an observational study. PLoS Med. 2011;8(8):e1001084.
  • Robins JM. Marginal Structural Models versus Structural Nested Models as tools for causal inference. In: Statistical Models in Epidemiology, the Environment, and Clinical Trials. Springer; 2000:95-133.
  • Hernán MA, Robins JM. Causal Inference: What If. Boca Raton: Chapman & Hall/CRC, 2020. Chapter 12. Free online

Need help choosing between MSMs, G-computation, and nested models?

Aqrab evaluates your study design and recommends the right causal method — with assumptions explicitly stated.

Try Aqrab Free →