The Causal Review

1 What Problem Does grf Solve?

Standard causal inference estimators—DiD, RDD, IV—produce a single number: the average treatment effect (ATE) or the average treatment effect on the treated (ATT). But in many applications, we care about who benefits from treatment, not just the average.

Consider a job training programme. The average effect on earnings might be modest—say, $500 per year. But perhaps the effect is $2,000 for displaced workers over 40 and near zero for young workers with strong labour market prospects. If so, targeting the programme to older displaced workers improves efficiency and welfare substantially.This requires estimating conditional average treatment effects (CATEs): τ(x) = E[Y₁ − Y₀ | X = x].

Wager and Athey [2018] introduced causal forests as a principled, non-parametric method for estimating CATEs. The method is implemented in the grf (Generalized Random Forests) R package [Athey et al., 2019].

2 Installation and Setup

install.packages("grf") library(grf)

The grf package requires R≥3.5 and is available from CRAN. It has no unusual system dependencies.

3 The Causal Forest Algorithm

3.1 Intuition

A causal forest estimates τ(x) by building many randomised decision trees, each of which identifies a neighbourhood of observations similar to a given x, and estimates the treatment effect in that neighbourhood using a local DiD-style comparison.

The key innovation over standard random forests is that the trees are built to maximise heterogeneity in treatment effects, not prediction of outcomes. This makes the forest "honest" [Athey and Imbens, 2016]: each tree is trained on one half of the data and evaluated on the other, avoiding overfitting.

3.2 The Estimating Equation

For each target observation x, the causal forest estimates:

τ̂(x) =

arg min τ

n ∑ i=1

α_i(x) [(Y_i − m̂⁽⁻ⁱ⁾(X_i)) − τ(D_i − ê⁽⁻ⁱ⁾(X_i))]²

(1)

where m̂⁽⁻ⁱ⁾(x) = E[Y | X = x] and ê⁽⁻ⁱ⁾(x) = E[D | X = x] are estimated out-of-sample (cross-fitted), and αᵢ(x) are the forest weights—the fraction of trees in which observation i falls in the same leaf as x.

This is a local version of the Robinson (1988) partially linear model—sometimes called the "R-learner" [Nie and Wager, 2021]—applied non-parametrically. Residualising on m̂ and ê removes the main effects, leaving only treatment effect variation to be estimated.

3.3 Identification Assumption

Causal forests assume unconfoundedness: conditional on covariates X, treatment D is independent of potential outcomes: (Y₀, Y₁) ⊥⊥ D | X This is the same assumption as matching and IPW. Causal forests do not solve unmeasured confounding—they estimate heterogeneous effects under the same observational study assumptions.

4 A Minimal Working Example

We simulate data with heterogeneous treatment effects and estimate a causal forest.

library(grf) set.seed(42) n <- 2000 p <- 5 # Simulate covariates X <- matrix(rnorm(n * p), nrow = n) # Propensity score (true) e <- 1 / (1 + exp(-X[, 1])) D <- rbinom(n, 1, e) # Treatment effect varies with X[,1] tau_true <- 2 * X[, 1] # Outcome Y <- 3 * X[, 2] + D * tau_true + rnorm(n) # Fit causal forest cf <- causal_forest(X, Y, D, num.trees = 2000, honesty = TRUE, seed = 42) # Predict CATEs for each observation tau_hat <- predict(cf)$predictions # Compare to true CATE cor(tau_hat, tau_true) # should be high # Average treatment effect estimate average_treatment_effect(cf)

The function average_treatment_effect() returns the ATE with a heteroskedasticity-robust standard error. The predict() method returns individual CATE estimates for all in-sample observations.

5 Key Options and Pitfalls

5.1 Important Options

num.trees: More trees reduce variance but increase computation. 2,000 is a reasonable default for moderate samples; use 4,000-10,000 for publication.
honesty: Always keep TRUE (the default). Dishonest forests overfit and produce invalid confidence intervals.
tune.parameters = "all": Let grf tune bandwidth and regularisation parameters via cross-validation. Recommended for applied use.
clusters: If observations are clustered (e.g., students in schools), pass cluster IDs here. Standard errors will account for within-cluster correlation.

5.2 Common Pitfalls

Interpreting ATE as causal without unconfoundedness: Causal forests are not magic—they require unconfoundedness just like matching. If treatment is endogenous, use instrumental variable forests (instrumental_forest()).
Using too few trees: With fewer than 500 trees, estimates are noisy. Increase num.trees for stable results.
Ignoring calibration: Test whether the estimated CATEs are informative using test_calibration(cf). A significant slope coefficient in the calibration regression confirms that predicted heterogeneity is real.

6 Testing for Heterogeneity

Before acting on CATE estimates, test whether heterogeneity is statistically meaningful:

# Calibration test (Chernozhukov et al. 2018) test_calibration(cf) # Best linear predictor of CATE blp <- best_linear_projection(cf, X[, 1:2]) print(blp)

The calibration test regresses realised outcomes on mean forest predictions and forest CATE predictions. A significant coefficient on the CATE component confirms heterogeneity. The best linear predictor identifies which covariates most strongly moderate the treatment effect.

7 Extensions in grf

The grf package provides several related estimators:

instrumental_forest(): Causal forest with an instrument for the treatment—useful when treatment is endogenous.
regression_forest(): Non-parametric regression of Y on X.
probability_forest(): For binary outcomes.
ll_causal_forest(): Local linear causal forest, better for smooth, low-dimensional CATE functions [Friedberg et al., 2021].

8 Comparison to Alternatives

Method	Main strength	Limitation
Causal forest (grf)	Valid CIs, no model spec	Requires unconfoundedness
IV forest (grf)	Handles endogenous D	Needs strong instrument
econml (Python)	More CATE estimators	Less focus on inference
DoubleML	Double ML for ATE	Less focus on CATE

9 Conclusion

The grf package makes causal forest estimation accessible and reliable. Its honest sample splitting, built-in variance estimation, and calibration tests make it suitable for applied research, not just prediction. For researchers seeking to estimate who benefits from an intervention rather than just whether the average effect is positive—causal forests are the current state of the art.

References

Wager, S. and Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523):1228-1242.
Athey, S., Tibshirani, J., and Wager, S. (2019). Generalized random forests. Annals of Statistics, 47(2):1148-1178.
Athey, S. and Imbens, G.W. (2016). Recursive partitioning for heterogeneous causal effects. Proceedings of the National Academy of Sciences, 113(27):7353-7360.
Nie, X. and Wager, S. (2021). Quasi-oracle estimation of heterogeneous treatment effects. Biometrika, 108(2):299-319.
Friedberg, R., Tibshirani, J., Athey, S., and Wager, S. (2021). Local linear forests. Journal of Computational and Graphical Statistics, 30(2):503-517.

Causal Forests in R: The grf Package

1 What Problem Does grf Solve?

2 Installation and Setup

3 The Causal Forest Algorithm

3.1 Intuition

3.2 The Estimating Equation

3.3 Identification Assumption

4 A Minimal Working Example

5 Key Options and Pitfalls

5.1 Important Options

5.2 Common Pitfalls

6 Testing for Heterogeneity

7 Extensions in grf

8 Comparison to Alternatives

9 Conclusion

References

Continue Reading

The causalml Package in Python: Uplift Modeling and CATE Meta-Learners

The gsynth Package in R: Generalized Synthetic Control with Interactive Fixed Effects

Recent Results: Immigration, Migration, and Labour Markets

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Article Title

Causal Forests in R: The grf Package

1 What Problem Does grf Solve?

2 Installation and Setup

3 The Causal Forest Algorithm

3.1 Intuition

3.2 The Estimating Equation

3.3 Identification Assumption

4 A Minimal Working Example

5 Key Options and Pitfalls

5.1 Important Options

5.2 Common Pitfalls

6 Testing for Heterogeneity

7 Extensions in grf

8 Comparison to Alternatives

9 Conclusion

References

Continue Reading

The causalml Package in Python: Uplift Modeling and CATE Meta-Learners

The gsynth Package in R: Generalized Synthetic Control with Interactive Fixed Effects

Recent Results: Immigration, Migration, and Labour Markets

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Stay current with causal inference

Article Title