Toolbox

Double Machine Learning in Practice: The DoubleML Package (R and Python)

1 What Problem Does DoubleML Solve?

Applied researchers frequently want to estimate the causal effect of a treatment or exposure on an outcome while controlling for a large number of covariates. When the covariate set is large relative to the sample size, standard OLS is unreliable: it overfits, and the resulting standard errors are invalid.

A common response is to use machine learning methods for model selection—LASSO, random forests, gradient boosting—to select relevant controls. But naively using ML predictions for causal inference introduces regularisation bias: LASSO, for example, deliberately shrinks coefficients toward zero, which biases the treatment effect estimate if the treatment variable is also regularised.

Double Machine Learning (DML) [Chernozhukov et al., 2018] solves this problem with two key ingredients: (1) partialling out the treatment from the outcome using residuals from ML predictions, and (2) cross-fitting to ensure the residuals are out-of-sample and free of overfitting bias.

The DoubleML package provides a production-quality implementation in both R and Python.

2 Installation

# R
install.packages("DoubleML")
library(DoubleML)

Python

# Python (pip)
# pip install doubleml

The R package requires the mlr3 ecosystem for machine learning. The Python package uses scikit-learn.

3 The DML Estimating Framework

3.1 The Partially Linear Model

The baseline DML model is the partially linear regression:

$$Y_{i}=\theta_{0}D_{i}+g_{0}(X_{i})+\epsilon_{i}$$

$$D_{i}=m_{0}(X_{i})+v_{i}$$

where $Y$ is the outcome, $D$ is the (scalar) treatment, $X$ are high-dimensional controls, $g_{0}$ and $m_{0}$ are unknown functions estimated by ML, and $\epsilon_{i}$, $v_{i}$ have zero mean conditional on $X_{i}$.

The target parameter is $\theta_{0}$, the causal effect of $D$ on $Y$ conditional on $X$.

3.2 The DML Estimator

The DML estimator proceeds in two steps:

  1. Partial out: Use ML to estimate $\hat{g}_{0}(x)=\hat{\mathbb{E}}[Y|X=x]$ and $\hat{m}_{0}(x)=\hat{\mathbb{E}}[D|X=x]$. Form residuals:$$\tilde{Y}_{i}=Y_{i}-\hat{g}_{0}(X_{i}), \quad \tilde{D}_{i}=D_{i}-\hat{m}_{0}(X_{i})$$
  2. Regress residual on residual:$$\hat{\theta}_{0}=(\sum_{i}\tilde{D}_{i}^{2})^{-1}\sum_{i}\tilde{D}_{i}\tilde{Y}_{i}$$

This is the Frisch-Waugh-Lovell theorem applied non-parametrically. By partialling out $X$ from both $Y$ and $D$, the remaining variation in $\tilde{D}$ is orthogonal to $X$, removing the confounding.

Cross-fitting: To avoid regularisation bias, $\hat{g}_{0}$ and $\hat{m}_{0}$ must be estimated on a different fold of the data than the one used to compute residuals. DML splits the data into $K$ folds and estimates on $K-1$ folds, predicting on the held-out fold. The final estimate averages over folds. This ensures the residuals are out-of-sample predictions, making them asymptotically unbiased.

3.3 Properties

Under regularity conditions (the ML nuisance functions converge to the truth at rate $n^{-1/4}$), $\hat{\theta}_{0}$ is $\sqrt{n}$ consistent and asymptotically normal:

$$(\hat{\theta}_{0} - \theta_{0}) \sim \mathcal{N}(0, V)$$

where $V$ is a variance that can be consistently estimated. This allows standard confidence intervals and t-tests.

4 A Minimal Working Example (R)

library(DoubleML)
library(mlr3)
library(mlr3learners)
set.seed(42)

# Simulate partially linear data
n <- 500; p <- 20
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("X", 1:p)
D <- 0.5 * X[,1] + rnorm(n)
Y <- 1.5 * D + X[,1] + X[,2] + rnorm(n)
# True theta_0 = 1.5

# Create DoubleML data object
data <- cbind(Y=Y, D=D, as.data.frame(X))
dml_data <- DoubleMLData$$new(data,
   y_col="Y", d_cols="D",
   x_cols=paste0("X", 1:p))

# Specify learners
lasso <- lrn("regr.cv_glmnet", alpha=1)
rf <- lrn("regr.ranger")

# Fit DML-PLR (partially linear regression)

dml_plr <- DoubleMLPLR$new(dml_data,

                          ml_l = lasso, # for Y ~ X (outcome model)

                          ml_m = lasso, # for D ~ X (treatment model)

                          n_folds = 5) [cite: 165, 166, 167, 171, 174]

dml_plr$fit()

dml_plr$summary()

# Coefficient should be close to 1.5

5 Beyond the Partially Linear Model

DoubleML supports several model classes:

  • PLR (DoubleMLPLR): Partially linear regression for continuous treatment.
  • PLIV (DoubleMLPLIV): Partially linear IV for endogenous continuous treatment with an instrument.
  • IRM (DoubleMLIRM): Interactive regression model for binary treatment, targeting ATE or ATT.
  • IIVM (DoubleMLIIVM): Interactive IV model for binary treatment with a binary instrument.

# Binary treatment: IRM for ATE
# ml_m uses a classification learner (classif.ranger) because D is binary:
# the propensity score P(D=1|X) is a probability requiring a classifier.
# ml_g uses a regression learner for the outcome model E[Y|D,X].
dml_irm <- DoubleMLIRM$new(dml_data,
                          ml_g = lrn("regr.ranger"),
                          ml_m = lrn("classif.ranger"),
                          score = "ATE",
                          n_folds = 5)
dml_irm$fit()
dml_irm$summary()

6 Key Options and Pitfalls

6.1 Choosing Learners

The DML estimator is valid for any ML learner that converges fast enough. Practical guidance:

  • LASSO/Elastic net: Good when the true model is sparse. Use lrn("regr.cv_glmnet").
  • Random forest: Good for nonlinear, high-dimensional settings. Use lrn("regr.ranger").
  • Stacking (ensemble): Combining multiple learners often outperforms any single one. Use mlr3pipelines for stacking.

6.2 Common Pitfalls

  1. Too few folds: Use at least 5-fold cross-fitting ($K=5$). With $K=2$, variance is high.
  2. Using the same learner for $g$ and $m$: If $Y$ and $D$ have different functional forms (e.g., $Y$ is binary, $D$ is continuous), use different learner types.
  3. Ignoring clustering: If observations are clustered, set cluster_cols in the data object and use clustered standard errors.

7 Comparison to Alternatives

Method Strength Limitation
DoubleML Valid inference, flexible ML Assumes unconfoundedness
grf (causal forest) CATE estimation Same assumption
PDS LASSO (hdm) Sparse settings Parametric
econml (Python) Many CATE estimators Less focus on ATE inference

8 Conclusion

The DoubleML package makes the Chernozhukov et al. (2018) DML framework accessible to applied researchers. By combining flexible ML for nuisance function estimation with cross-fitting and Neyman-orthogonal score functions, it achieves valid $\sqrt{n}$ inference for treatment effects even in high-dimensional settings. For researchers working with many controls but a single or small number of treatments, DoubleML is the recommended tool.

References

  1. Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. Econometrics Journal, 21(1):C1-C68.
  2. Bach, P., Chernozhukov, V., Kurz, M.S., and Spindler, M. (2022). DoubleML—an object-oriented implementation of double machine learning in R. Journal of Statistical Software, 103(3):1-45.
  3. Belloni, A., Chernozhukov, V., and Hansen, C. (2014). Inference on treatment effects after selection among high-dimensional controls. Review of Economic Studies, 81(2):608-650.
  4. Robinson, P.M. (1988). Root-N-consistent semiparametric regression. Econometrica, 56(4):931-954.

Continue Reading

Browse All Sections →
Home
This is some text inside of a div block.
This is some text inside of a div block.
This is some text inside of a div block.

Article Title