1 What Problem Does grf Solve?
Standard causal inference estimators—DiD, RDD, IV—produce a single number: the average treatment effect (ATE) or the average treatment effect on the treated (ATT). But in many applications, we care about who benefits from treatment, not just the average.
Consider a job training programme. The average effect on earnings might be modest—say, $500 per year. But perhaps the effect is $2,000 for displaced workers over 40 and near zero for young workers with strong labour market prospects. If so, targeting the programme to older displaced workers improves efficiency and welfare substantially.This requires estimating conditional average treatment effects (CATEs): $\tau(x)=\mathbb{E}[Y^{1}-Y^{0}|X=x]$.
Wager and Athey [2018] introduced causal forests as a principled, non-parametric method for estimating CATEs. The method is implemented in the grf (Generalized Random Forests) R package [Athey et al., 2019].
2 Installation and Setup
install.packages("grf")
library(grf)
The grf package requires $R>3.5$ and is available from CRAN. It has no unusual system dependencies.
3 The Causal Forest Algorithm
3.1 Intuition
A causal forest estimates $\tau(x)$ by building many randomised decision trees, each of which identifies a neighbourhood of observations similar to a given $x$, and estimates the treatment effect in that neighbourhood using a local DiD-style comparison.
The key innovation over standard random forests is that the trees are built to maximise heterogeneity in treatment effects, not prediction of outcomes. This makes the forest "honest" [Athey and Imbens, 2016]: each tree is trained on one half of the data and evaluated on the other, avoiding overfitting.
3.2 The Estimating Equation
For each target observation $x$, the causal forest estimates:
$$\hat{\tau}(x)=arg~min_{\tau}\sum_{i=1}^{n}\alpha_{i}(x)[(Y_{i}-\hat{m}^{(-i)}(X_{i}))-\tau(D_{i}-\hat{e}^{(-i)}(X_{i}))]^{2}$$
where $\hat{m}^{(-i)}(x)=\mathbb{E}[Y|X=x]$ and $\hat{e}^{(-i)}(x)=\mathbb{E}[D|X=x]$ are estimated out-of-sample (cross-fitted), and $\alpha_{i}(x)$ are the forest weights—the fraction of trees in which observation $i$ falls in the same leaf as $x$.
This is a local version of the Robinson (1988) partially linear model—sometimes called the "R-learner" [Nie and Wager, 2021]—applied non-parametrically. Residualising on $\hat{m}$ and $\hat{e}$ removes the main effects, leaving only treatment effect variation to be estimated.
3.3 Identification Assumption
Causal forests assume unconfoundedness: conditional on covariates $X$, treatment $D$ is independent of potential outcomes: $(Y^{0},Y^{1})\perp\perp D|X.$ This is the same assumption as matching and IPW. Causal forests do not solve unmeasured confounding—they estimate heterogeneous effects under the same observational study assumptions.
4 A Minimal Working Example
We simulate data with heterogeneous treatment effects and estimate a causal forest.
library(grf)
set.seed(42)
n <- 2000
p <- 5
# Simulate covariates
X <- matrix(rnorm(n * p), nrow = n)
# Propensity score (true)
e <- 1 / (1 + exp(-X[, 1]))
D <- rbinom(n, 1, e)
# Treatment effect varies with X[,1]
tau_true <- 2 * X[, 1]
# Outcome
Y <- 3 * X[, 2] + D * tau_true + rnorm(n)
# Fit causal forest
cf <- causal_forest(X, Y, D, num.trees = 2000, honesty = TRUE, seed = 42)
# Predict CATEs for each observation
tau_hat <- predict(cf)$predictions
# Compare to true CATE
cor(tau_hat, tau_true) # should be high
# Average treatment effect estimate
average_treatment_effect(cf)
The function average_treatment_effect() returns the ATE with a heteroskedasticity-robust standard error. The predict() method returns individual CATE estimates for all in-sample observations.
5 Key Options and Pitfalls
5.1 Important Options
- num.trees: More trees reduce variance but increase computation. 2,000 is a reasonable default for moderate samples; use 4,000-10,000 for publication.
- honesty: Always keep TRUE (the default). Dishonest forests overfit and produce invalid confidence intervals.
- tune.parameters = "all": Let grf tune bandwidth and regularisation parameters via cross-validation. Recommended for applied use.
- clusters: If observations are clustered (e.g., students in schools), pass cluster IDs here. Standard errors will account for within-cluster correlation.
5.2 Common Pitfalls
- Interpreting ATE as causal without unconfoundedness: Causal forests are not magic—they require unconfoundedness just like matching. If treatment is endogenous, use instrumental variable forests (
instrumental_forest()). - Using too few trees: With fewer than 500 trees, estimates are noisy. Increase
num.treesfor stable results. - Ignoring calibration: Test whether the estimated CATEs are informative using
test_calibration(cf). A significant slope coefficient in the calibration regression confirms that predicted heterogeneity is real.
6 Testing for Heterogeneity
Before acting on CATE estimates, test whether heterogeneity is statistically meaningful:
# Calibration test (Chernozhukov et al. 2018)
test_calibration(cf)
# Best linear predictor of CATE
blp <- best_linear_projection(cf, X[, 1:2])
print(blp)
The calibration test regresses realised outcomes on mean forest predictions and forest CATE predictions. A significant coefficient on the CATE component confirms heterogeneity. The best linear predictor identifies which covariates most strongly moderate the treatment effect.
7 Extensions in grf
The grf package provides several related estimators:
- instrumental_forest(): Causal forest with an instrument for the treatment—useful when treatment is endogenous.
- regression_forest(): Non-parametric regression of $Y$ on $X$.
- probability_forest(): For binary outcomes.
- ll_causal_forest(): Local linear causal forest, better for smooth, low-dimensional CATE functions [Friedberg et al., 2021].
8 Comparison to Alternatives
9 Conclusion
The grf package makes causal forest estimation accessible and reliable. Its honest sample splitting, built-in variance estimation, and calibration tests make it suitable for applied research, not just prediction. For researchers seeking to estimate who benefits from an intervention rather than just whether the average effect is positive—causal forests are the current state of the art.
References
- Wager, S. and Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523):1228-1242.
- Athey, S., Tibshirani, J., and Wager, S. (2019). Generalized random forests. Annals of Statistics, 47(2):1148-1178.
- Athey, S. and Imbens, G.W. (2016). Recursive partitioning for heterogeneous causal effects. Proceedings of the National Academy of Sciences, 113(27):7353-7360.
- Nie, X. and Wager, S. (2021). Quasi-oracle estimation of heterogeneous treatment effects. Biometrika, 108(2):299-319.
- Friedberg, R., Tibshirani, J., Athey, S., and Wager, S. (2021). Local linear forests. Journal of Computational and Graphical Statistics, 30(2):503-517.