The Causal Review

1 What Problem Do These Tools Solve?

Every causal inference study rests on a set of assumptions about the data-generating process: which variables affect which others, which paths are open, and which variables confound the treatment-outcome relationship. These assumptions are often stated in prose— "we assume that conditional on education and age, income is independent of the error term"— but prose can obscure logical inconsistencies, miss collider bias, or fail to identify the minimal sufficient adjustment set.

Directed acyclic graphs (DAGs) provide a formal, transparent way to encode these assumptions [Pearl, 2009]. A DAG is a set of nodes (variables) and directed edges (arrows representing direct causal relationships), with no cycles. Once drawn, a DAG supports rigorous, algorithmic reasoning about identification: the backdoor criterion [Pearl, 1993], d-separation, minimal adjustment sets, and collider identification.

The dagitty package [Textor et al., 2016] implements Pearl's graphical framework in R, allowing analysts to encode a DAG and query it for adjustment sets, testable implications, and instrument validity. The ggdag package [Barrett, 2021] provides ggplot2-compatible visualisations of dagitty objects.

2 Installation and Setup`‍`

# Listing 1: Package installation
install.packages(c("dagitty", "ggdag", "ggplot2"))

library(dagitty)
library(ggdag)
library(ggplot2)

Both packages are on CRAN and have no system dependencies beyond a standard R installation.

3 A Minimal Working Example: Education and Earnings

3.1 Encoding the DAG

# Listing 2: Building a DAG for the education-earnings example
# Define the DAG using dagitty syntax
dag_edu <- dagitty('
  dag {
    Ability [latent, pos="0,1"]
    Family [pos="0,2"]
    School [exposure, pos="1,1.5"]
    Earn [outcome, pos="2,1.5"]
    
    Ability -> School
    Ability -> Earn
    Family -> School
    Family -> Earn
    School -> Earn
  }
')

# Check that it is a valid DAG
isAcyclic(dag_edu) # should return TRUE

The DAG encodes:

Ability (latent/unobserved) affects both School and Earn, creating backdoor confounding.

Family background affects both School and Earn.

School directly affects Earn (the causal effect of interest).

3.2 Querying Adjustment Sets

The key query: what set of observed variables do we need to condition on to identify the effect of School on Earn via the backdoor criterion?

# Listing 3: Finding sufficient adjustment sets
adjustmentSets(dag_edu, exposure = "School", outcome = "Earn")

# Returns: { Family }
# (Ability is latent and cannot be adjusted for)

The output tells us that conditioning on Family blocks all backdoor paths. Since Ability is latent (unobserved), it cannot be in an adjustment set— meaning selection on observables cannot identify the effect of schooling once ability is unobserved. This confirms what econometricians know from the omitted variable bias formula.

3.3 Checking Testable Implications

Every DAG implies a set of conditional independence relations (d-separations) that can in principle be tested in data:

# Listing 4: Extracting testable implications from the DAG
impliedConditionalIndependencies(dag_edu)
# Returns the conditional independencies implied by the graph

If the DAG is correctly specified, these independence restrictions should hold approximately in the data. Testing them (e.g., using partial correlations or regression residuals) is a form of DAG specification testing.

4 Visualising with ggdag`‍`

# Listing 5: Visualising the DAG with ggdag
# Convert dagitty object to tidy format
tidy_dag <- tidy_dagitty(dag_edu)

# Basic plot
ggdag(tidy_dag, layout = "nicely") +
  theme_dag() +
  geom_dag_edges_arc() +
  geom_dag_node(aes(color = name)) +
  geom_dag_label_repel(aes(label = name)) +
  labs(title = "Education and Earnings DAG")

# Highlight adjustment set
ggdag_adjustment_set(tidy_dag, exposure = "School", outcome = "Earn") +
  theme_dag()

ggdag_adjustment_set() shades nodes in the adjustment set green and shows which variables are adjusted vs. unadjusted, making it easy to communicate the identification strategy visually.

5 Collider Bias: A DAG-Based Warning System

One of the most valuable uses of DAGs is identifying collider bias— the bias introduced by conditioning on a common effect of two variables. Consider a healthcare setting:

# Listing 6: Collider bias example: hospitalisation and mortality
dag_collider <- dagitty('
  dag {
    Disease [pos="0,1"]
    Injury [pos="0,0"]
    Hospital [pos="1,0.5"]
    Death [outcome, pos="2,0.5"]
    
    Disease -> Death
    Disease -> Hospital
    Injury -> Hospital
    Hospital -> Death
  }
')

# Is Disease d-separated from Injury given Hospital?
dseparated(dag_collider, "Disease", "Injury", c("Hospital"))
# Returns FALSE — conditioning on Hospital opens a collider path!

Without conditioning on Hospital, disease and injury are independent (no common cause). Conditioning on hospitalisation opens a collider path: among hospitalised patients, disease and injury are negatively correlated (if you're in hospital, knowing you have a disease makes it less likely you have an injury as the cause). This is the infamous "Berkson's paradox" [Berkson, 1946], and it can create spurious associations or mask real ones.

DAGs make collider bias transparent and mechanical to detect: conditioning on a collider always opens a path between its parents that was previously closed.

6 Instrument Validity

DAGs also formalize IV validity conditions:

# Listing 7: Testing instrument validity in a DAG
dag_iv <- dagitty('
  dag {
    Z [pos="0,1"]      # Instrument (proximity to college)
    U [latent, pos="1,0"] # Unobservable (ability)
    D [exposure, pos="1,1"] # Treatment (college)
    Y [outcome, pos="2,1"]  # Outcome (earnings)
    
    Z -> D
    U -> D
    U -> Y
    D -> Y
  }
')

# Is Z a valid instrument for D -> Y?
instrumentalVariables(dag_iv, exposure = "D", outcome = "Y")
# Returns: Z satisfies IV conditions (relevance, exclusion, exogeneity)

The instrumentalVariables() function checks whether the proposed instrument satisfies the graphical conditions for IV validity: it must be a cause of the treatment, d-separated from the outcome given the treatment, and not a descendant of a collider on the instrument-outcome path.

7 Comparison to Alternatives

For the econometrics researcher using R, dagitty + ggdag is the natural choice: it implements the full Pearl graphical calculus, integrates with ggplot2 for publication-quality figures, and is available on CRAN with no special installation requirements. ‍

Tool	Language	Strength	Limitation
dagitty (R)	R	Full Pearl calculus, CRAN	Static visualisation
ggdag (R)	R	ggplot2 integration	Plotting only
dagitty.net	Web browser	Interactive GUI	No scripted workflow
causaldag (Py)	Python	DAG + estimation	Less graph querying
DoWhy (Py)	Python	End-to-end pipeline	Heavy dependencies

Table 1: Causal Graph Tools Comparison

8 Key Options and Pitfalls

Mark latent variables: Use [latent] to tag unobserved variables. This prevents adjustmentSets() from including them in adjustment sets.

Use pos for consistent layouts: Specifying node positions in the DAG string ensures figures are reproducible.

DAGs encode qualitative structure, not effect sizes: A DAG arrow means "direct causal relationship exists"; it says nothing about magnitude or sign.

DAGs require completeness: Omitting a node or arrow that exists in the true data-generating process can lead to erroneous identification conclusions. When in doubt, include more structure and check whether the conclusions are robust.

References

Barrett, M. (2021). ggdag: Analyse and Create Elegant Directed Acyclic Graphs. R package version 0.2.3. https://CRAN.R-project.org/package=ggdag.
Berkson, J. (1946). Limitations of the application of fourfold table analysis to hospital data. Biometrics Bulletin, 2(3), 47-53.
Pearl, J. (1993). Bayesian analysis in expert systems: Comment: Graphical models, causality and intervention. Statistical Science, 8(3), 266-269.
Pearl, J. (2009). Causality: Models, Reasoning, and Inference, 2nd ed. Cambridge University Press.
Textor, J., van der Zander, B., Gilthorpe, M. S., Liskiewicz, M., and Ellison, G. . . . (2016). Robust causal inference using directed acyclic graphs: The R package 'dagitty'. International Journal of Epidemiology, 45(6), 1887-1894.

dagitty and ggdag in R: Drawing and Querying Causal Graphs

1 What Problem Do These Tools Solve?

2 Installation and Setup`‍`

3 A Minimal Working Example: Education and Earnings

3.1 Encoding the DAG

3.2 Querying Adjustment Sets

3.3 Checking Testable Implications

4 Visualising with ggdag`‍`

5 Collider Bias: A DAG-Based Warning System

6 Instrument Validity

7 Comparison to Alternatives

8 Key Options and Pitfalls

References

Continue Reading

The ivmte Package in R: Marginal Treatment Effects and Bounding Policy-Relevant Parameters

The contdid Package in R: Estimating Dose-Response Functions with Continuous Treatments

Recent Results: Housing Markets, Rent Control, and Urban Economics

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Article Title

dagitty and ggdag in R: Drawing and Querying Causal Graphs

1 What Problem Do These Tools Solve?

2 Installation and Setup‍

3 A Minimal Working Example: Education and Earnings

3.1 Encoding the DAG

3.2 Querying Adjustment Sets

3.3 Checking Testable Implications

4 Visualising with ggdag‍

5 Collider Bias: A DAG-Based Warning System

6 Instrument Validity

7 Comparison to Alternatives

8 Key Options and Pitfalls

References

Continue Reading

The ivmte Package in R: Marginal Treatment Effects and Bounding Policy-Relevant Parameters

The contdid Package in R: Estimating Dose-Response Functions with Continuous Treatments

Recent Results: Housing Markets, Rent Control, and Urban Economics

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Stay current with causal inference

Article Title

2 Installation and Setup`‍`

4 Visualising with ggdag`‍`