Aqrab is an AI-powered methodology critique engine for clinical researchers. It analyzes study protocols and detects specific biases like immortal time bias, healthy user confounding, and missing sensitivity analyses — providing named issues with actionable fixes.

How is Aqrab different from ChatGPT for research?

Generic AI gives generic advice like "consider confounding." Aqrab identifies specific biases by name (immortal time bias, competing risks, positivity violations), explains the mechanism, and provides concrete fixes. Built by researchers from Mount Sinai, Johns Hopkins, and King Abdulaziz University with 20+ years of causal inference expertise.

Yes — Aqrab offers 3 free critiques with no credit card required. The Pro plan at $99/month includes unlimited critiques, RAG-powered literature analysis, and exportable reports.

What types of studies can Aqrab critique?

Aqrab critiques observational studies (cohort, case-control, cross-sectional), clinical trials (RCTs, adaptive designs), and epidemiological analyses using methods like propensity score matching, instrumental variables, difference-in-differences, and target trial emulation.

← Back to Blog

Causal InferenceDAGsCausal Framework

Structural Causal Models & DAGs: A Practical Guide for Clinical Researchers

Name: Aqrab
Author: Coefficients Health Analytics

April 2, 2026·18 min read·By Coefficients Health Analytics

Every causal inference method you've ever used — propensity scores, instrumental variables, difference-in-differences, regression discontinuity — relies on assumptions you draw on a napkin before you touch the data. Structural Causal Models (SCMs) and Directed Acyclic Graphs (DAGs) are the formal language for those assumptions. This guide teaches you the framework that Pearl spent 30 years building, in the language clinical researchers actually need: how to draw the graph, read off which variables to adjust for, and understand why your method works (or doesn't).

In this guide

1. Why causal graphs exist
2. Building your DAG: the three rules
3. Path types: the four patterns that matter
4. d-separation: reading the graph
5. The backdoor criterion: when confounding is fixable
6. The frontdoor criterion: when unmeasured confounding is not fatal
7. do-calculus: the three rules
8. Mediation analysis: natural direct & indirect effects
9. Seven DAG mistakes that invalidate your analysis
10. What reviewers expect: the reporting checklist
11. Getting automated critique with Aqrab

1. Why causal graphs exist

The fundamental problem of causal inference is simple: we never observe both potential outcomes for the same patient. You see what happened when they got the drug. You don't see what would have happened if they hadn't. Everything in causal inference — every method, every estimator, every sensitivity analysis — exists to fill that gap.

Structural Causal Models (SCMs) provide the formal framework for that gap. An SCM is a set of structural equations where each variable is written as a function of its parents and an independent error term:

X = f_X(Parents(X), U_X) | Y = f_Y(Parents(Y), U_Y)

Each structural equation is a mechanism — a stable relationship that can be manipulated independently. The error terms U are mutually independent. This independence is what gives SCMs their power: it's the structural assumption that lets you reason about interventions.

The Directed Acyclic Graph (DAG) is the visual representation of an SCM. Nodes are variables. Arrows go from parents to children. The graph encodes which variables directly influence which other variables. From the graph alone — before looking at data — you can determine:

Which variables to adjust for to remove confounding
Which variables you must not adjust for (colliders, mediators)
Whether causal identification is even possible with your data
Which observed association is actually causal and which is spurious

💡 Clinical intuition

Think of a DAG as a study design document that a statistician can read. It says: “Here is what I believe causes what, and here is why I think my analysis will give me a causal answer.” If you can't draw the DAG, you can't justify the analysis.

2. Building your DAG: the three rules

Drawing a DAG is not about being fancy. It's about being explicit. Here are the three rules:

Rule 1: Arrows are mechanisms, not correlations

Draw an arrow from A → B only if changing A would mechanistically change B, not just because they are correlated. Age causes blood pressure to rise. Age and blood pressure are correlated. But “being tall” and “being able to reach high shelves” are correlated without a causal arrow between them — a third variable (height) causes both.

Rule 2: Include unmeasured variables you know exist

If socioeconomic status confounds your exposure-outcome relationship but you didn't measure it, draw it as a dashed node. This forces you to confront the fact that your adjustment set is incomplete — which leads directly to sensitivity analysis (E-value, Rosenbaum bounds, etc.).

Rule 3: Include the outcome, and make it acyclic

The DAG must include the outcome node. And there must be no directed cycles — you can't have A → B → C → A. Time flows forward. Feedback loops are real, but they're modeled with time-indexed variables (A_t → B_t+1 → A_t+2), not cycles within a single time point.

🔧 Practical tip

Use DAGitty (dagitty.net) or ggdag in R to draw and analyze your DAG. Both tools can automatically identify adjustment sets. Don't draw DAGs on paper and then guess at the adjustment set — let the software do the d-separation math for you.

3. Path types: the four patterns that matter

Every path between two variables in a DAG is one of four types. Understanding these is the entire game.

Pattern 1: Causal chain (A → M → Y)

A causes M, M causes Y. This is a mediated causal path. Adjusting for M blocks part of the A → Y effect (you're measuring the direct effect only). Don't adjust for mediators when you want the total effect.

Pattern 2: Common cause / Confounder (A ← L → Y)

L causes both A and Y. This creates a backdoor path that produces non-causal association between A and Y. Adjusting for L closes this path and removes confounding. This is why we do propensity score adjustment, regression adjustment, stratification, etc.

Pattern 3: Collider (A → L ← Y)

L has two arrows pointing into it — one from A, one from Y. L is a collider. By default, A and Y are independent through this path. But if you adjust for L (or condition on it), you open the path and create spurious association. This is collider bias — also called Berkson's paradox or selection bias.

⚠️ The most common DAG mistake

Adjusting for a collider opens a backdoor path. Example: adjusting for “hospital admission” when admission is caused by both severity (A) and mortality risk (Y). You create a spurious negative association between severity and death — sicker patients who survived look “worse” because they're in the hospital. Never adjust for a collider.

Pattern 4: Instrumental variable (A ← IV → Y, IV ⊥ Y|A)

An instrumental variable (IV) causes the exposure but does not directly cause the outcome. The IV creates a path through which variation in A is quasi-random — because the IV is independent of confounders. This is the structural justification for IV estimation. If you can't draw the IV in the DAG such that all IV→Y paths go through A, the exclusion restriction fails.

4. d-separation: reading the graph

d-separation is the formal rule for determining whether two variables are independent conditional on a set of other variables. A path is d-separated (blocked) if:

It contains a non-collider that is in the conditioning set
It contains a collider that is not in the conditioning set, and no descendant of that collider is in the set

If all paths between X and Y are d-separated by Z, then X ⊥ Y | Z. This is the Markov condition— the fundamental connection between the graph structure and conditional independence in data.

Worked example: Treatment, Outcome, and Three Covariates

Suppose your DAG is: L1 → A → Y, L2 → A, L2 → Y, L3 → Y, and A → Y. L1 is a mediator-in-reverse (A ← L1, but L1 doesn't cause Y). L2 is a confounder. L3 is a predictor of Y only.

Backdoor paths from A to Y: A ← L2 → Y. This is the only confounding path. To block it, adjust for L2. Don't adjust for L1 (not a confounder, wastes degrees of freedom). L3 is irrelevant for identification but including it increases precision.

Minimum sufficient adjustment set: { L2 }. With L3 included for precision: { L2, L3 }.

5. The backdoor criterion: when confounding is fixable

The backdoor criterion (Pearl, 1995) is the simplest and most important identification result in causal inference. A set of variables Z satisfies the backdoor criterion relative to (A, Y) if:

No node in Z is a descendant of A (don't adjust for mediators)
Z blocks every backdoor path from A to Y (close all confounding paths)

If Z satisfies the backdoor criterion, then the causal effect of A on Y is identified:

P(Y | do(A=a)) = Σ_z P(Y | A=a, Z=z) P(Z=z)

This is the backdoor adjustment formula. It says: stratify by Z, then average over Z. This is exactly what standardization, IPW, and g-computation implement — but the DAG tells youwhich Z to use.

The DAG replaces guesswork. Instead of “I adjusted for everything in the dataset,” you can say: “The DAG identifies L1, L2, and L3 as the sufficient adjustment set. Adjusting for additional variables (L4, L5) would not change the identification but may affect precision.” This is a much stronger claim than “I controlled for everything.”

6. The frontdoor criterion: when unmeasured confounding is not fatal

Sometimes you have an unmeasured confounder U between A and Y, and you can't adjust for it. Most researchers stop there. But the frontdoor criterion provides a way out — if a mediator M exists that satisfies three conditions:

M fully mediates the effect of A on Y (all A → Y paths go through M)
There is no unmeasured confounding between A and M
There is no unmeasured confounding between M and Y (conditional on A)

Classic example: Smoking → Tar → Cancer

Suppose U (genetics) confounds smoking → cancer. But smoking causes tar buildup, and tar causes cancer. If you can measure tar but not genetics, the frontdoor criterion lets you identify the causal effect of smoking on cancer through two sequential adjustments: (1) smoking → tar (no confounding), and (2) tar → cancer adjusting for smoking (blocks the backdoor path through U). The frontdoor formula gives you P(Cancer | do(Smoking)) without ever measuring the unmeasured confounder.

Frontdoor identification is rare in clinical research but powerful when it applies. Think about it whenever you have a mechanistic mediator that you measured but the confounders you didn't.

7. do-calculus: the three rules

When neither the backdoor nor frontdoor criterion applies, do-calculus is the general framework for determining whether a causal quantity is identifiable from observational data. It consists of three rules that transform expressions involving do(·) into expressions without do(·) — if the d-separation conditions allow it.

The Three Rules of do-calculus

Rule 1 — Insertion/deletion of observations

P(Y | do(X), Z, W) = P(Y | do(X), W) if Y is d-separated from Z given (X, W) in the graph where all arrows into X are removed.

Rule 2 — Action/observation exchange

P(Y | do(X), do(Z), W) = P(Y | do(X), Z, W) if Y is d-separated from Z given (X, W) in the graph where all arrows into X are removed.

Rule 3 — Insertion/deletion of actions

P(Y | do(X), do(Z), W) = P(Y | do(X), W) if Y is d-separated from Z given (X, W) in the graph where all arrows into X and Z are removed.

In practice, you rarely apply do-calculus by hand. The backdoor and frontdoor criteria cover most clinical applications. But do-calculus is the theoretical foundation: if a causal effect is identifiable, do-calculus will find the identification formula. If it's not identifiable, do-calculus will tell you that too. Tools like DAGitty and CausalPy implement these rules automatically.

8. Mediation analysis: natural direct & indirect effects

One of the most powerful applications of SCMs is formalizing mediation analysis. In a simple A → Y causal question, mediation asks: how much of the total effect operates through a mediator M? The DAG is: A → M → Y and A → Y. The total effect of A on Y decomposes into:

Total Effect = Natural Direct Effect + Natural Indirect Effect

Natural Direct Effect (NDE): The effect of A on Y when M is set to the value it would have taken if A had been at its reference level. In formula: E[Y(a, M(a*)) − Y(a*, M(a*))].

Natural Indirect Effect (NIE): The effect of A on Y through M, when A changes from a* to a and M changes from M(a*) to M(a). In formula: E[Y(a, M(a)) − Y(a, M(a*))].

The NDE answers: “If we gave the drug but blocked its effect on the mediator, how much effect remains?” The NIE answers: “How much of the drug's effect works through the mediator pathway?”

The identification problem

Natural direct and indirect effects require a cross-world counterfactual: you need to know what M would have been under treatment a AND under treatment a*, for the same individual. This is untestable from data — it requires knowing both potential outcomes of M for the same person. The no-interaction assumption (no A × M interaction on the Y scale) or the sequential ignorability assumption (M is unconfounded given A and pre-treatment covariates) resolves this.

When to use what

Method	Estimand	Key Assumption	Software
Baron-Kenny	Product of a-path × b-path	No interaction A×M; linearity	Any regression software
SEM (structural equation model)	NDE + NIE via product/difference	Linearity; no unmeasured A→Y confounders; sequential ignorability for M	Mplus, lavaan (R), medflex
Parametric g-formula	NDE, NIE, controlled direct effect	Correct outcome model; sequential ignorability	medflex, causalMed (R)
Inverse odds ratio weighting	Natural direct effect	No A→M confounders; positivity	mediation (R), SAS proc causalmed

⚠️ The controlled direct effect is different

The controlled direct effect (CDE) holds M at a fixed value m for everyone. It answers: “If everyone had M = m, what would be the effect of A on Y?” CDE is identified under weaker assumptions than NDE/NIE (no cross-world counterfactuals needed). But CDE ≠ NDE when there is A × M interaction on the Y scale. Researchers often report CDE and call it NDE — this is wrong and reviewers should catch it.

9. Seven DAG mistakes that invalidate your analysis

Mistake 1: Drawing the DAG from the data

The DAG must come from subject-matter knowledge before seeing the data. Data-driven DAGs are circular reasoning — you're using the data to determine the adjustment set that will be used on the same data. Write the DAG at the study design stage, in the protocol, before IRB approval.

Mistake 2: Adjusting for everything “just in case”

Throwing all covariates into a regression is not “adjusting for confounding.” You may be adjusting for colliders (biasing your estimate) or mediators (blocking the causal effect you want to measure). The DAG tells you exactly which variables to include.

Mistake 3: Adjusting for post-treatment variables

Variables measured after treatment exposure (e.g., complications, laboratory changes) may be mediators. Adjusting for them blocks part of the causal effect. If you want the total effect, don't adjust for anything caused by the treatment. If you want the direct effect, be explicit that you're blocking the mediated pathway.

Mistake 4: Using the DAG to “prove” unmeasured confounding doesn't exist

“My DAG has no unmeasured confounders” is a claim that requires external justification, not graph topology. Every DAG can be drawn without unmeasured confounders — that doesn't make it true. Always run sensitivity analysis (E-value) regardless of what the DAG looks like.

Mistake 5: Confusing correlation with causation in the graph itself

Drawing A → B because A and B are correlated in your data. Arrows represent mechanisticinfluence, not statistical association. A causes B means: if you surgically intervened on A, B would change. Temperature and ice cream sales are correlated; temperature causes ice cream sales (not the other way around).

Mistake 6: Ignoring measurement error

DAGs assume perfect measurement. When exposure or confounders are measured with error, the standard DAG-based identification may fail. Differential misclassification of the exposure can create bias even when the DAG is correct for the true (unobserved) variables. Add a “measurement model” layer or run sensitivity analysis for misclassification.

Mistake 7: Drawing a single DAG for a multi-site study

Different sites may have different causal structures (effect heterogeneity, different confounding patterns). A single DAG assumes the same structural relationships everywhere. If you're pooling data across sites with different treatment assignment mechanisms, you may need site-specific DAGs or explicit interaction terms.

10. What reviewers expect: the reporting checklist

Reviewers increasingly expect a DAG in observational studies. Here's what to include:

DAG Reporting Checklist (10 items)

DAG is included as a figure with all variables (measured and unmeasured) shown
Arrow justifications are provided — each arrow has a citation or mechanistic rationale
Unmeasured confounders are explicitly shown as dashed nodes
Adjustment set is stated, with justification from d-separation criteria
Variables excluded from adjustment are listed, with reasons (mediator, collider, descendant of treatment)
Software used for DAG construction and adjustment set identification is named (DAGitty, ggdag, etc.)
Sensitivity to DAG assumptions — what happens if you add/remove an arrow? An E-value or bias analysis
Multiple DAGs are considered if there is genuine uncertainty about the causal structure
The causal estimand is stated (ATE, ATT, CATE, NDE, etc.) and mapped to the DAG
Instrumental variable assumptions (if using IV) are stated and linked to DAG structure

📝 The “good DAG” test

A good DAG is one where a second researcher, reading only the DAG and your justifications, can reproduce your adjustment set without seeing your data. If they can't, your DAG isn't specific enough.

11. Getting automated critique with Aqrab

Drawing the DAG is the first step. Validating it — checking whether your adjustment set is correct, whether you've missed confounders, whether your IV exclusion restriction holds — is where most studies fail. Aqrab's automated critique engine evaluates your study design against the causal identification framework and flags problems before you run the analysis.

Whether you're using PSM, IV, IPW, DML, or any other method, the critique starts from the DAG and works forward: is the identification valid? Is the estimator consistent for the stated estimand? What sensitivity analysis is needed? Submit your protocol or methods section and get an evidence-based critique in minutes.

Key takeaways

→DAGs are not decoration — they determine which variables you adjust for and which you must not touch
→The backdoor criterion identifies the adjustment set; d-separation provides the formal proof
→Never adjust for mediators (post-treatment variables) or colliders — both create bias
→Mediation analysis decomposes total effects into direct and indirect pathways via structural equations
→Always report the DAG, justify every arrow, name the adjustment set, and run sensitivity analysis

Let Aqrab critique your study design

Paste your protocol, methods section, or DAG description and get an automated critique that checks identification assumptions, flags adjustment set errors, and suggests sensitivity analyses. From $29/month for researchers.

Start Free Trial API Docs

Continue reading in the causal inference series:

Inverse Probability Weighting: When PSM Discards Your Data

Next in series

Instrumental Variables: When Observational Data Meets Unmeasured Confounding