1 What Problem Does DoubleML Solve?
Applied researchers frequently want to estimate the causal effect of a treatment or exposure on an outcome while controlling for a large number of covariates. When the covariate set is large relative to the sample size, standard OLS is unreliable: it overfits, and the resulting standard errors are invalid.
A common response is to use machine learning methods for model selection—LASSO, random forests, gradient boosting—to select relevant controls. But naively using ML predictions for causal inference introduces regularisation bias: LASSO, for example, deliberately shrinks coefficients toward zero, which biases the treatment effect estimate if the treatment variable is also regularised.
Double Machine Learning (DML) [Chernozhukov et al., 2018] solves this problem with two key ingredients: (1) partialling out the treatment from the outcome using residuals from ML predictions, and (2) cross-fitting to ensure the residuals are out-of-sample and free of overfitting bias.
The DoubleML package provides a production-quality implementation in both R and Python.
2 Installation
# R
install.packages("DoubleML")
library(DoubleML)
Python
# Python (pip)
# pip install doubleml
The R package requires the mlr3 ecosystem for machine learning. The Python package uses scikit-learn.
3 The DML Estimating Framework
3.1 The Partially Linear Model
The baseline DML model is the partially linear regression:
$$Y_{i}=\theta_{0}D_{i}+g_{0}(X_{i})+\epsilon_{i}$$
$$D_{i}=m_{0}(X_{i})+v_{i}$$
where $Y$ is the outcome, $D$ is the (scalar) treatment, $X$ are high-dimensional controls, $g_{0}$ and $m_{0}$ are unknown functions estimated by ML, and $\epsilon_{i}$, $v_{i}$ have zero mean conditional on $X_{i}$.
The target parameter is $\theta_{0}$, the causal effect of $D$ on $Y$ conditional on $X$.
3.2 The DML Estimator
The DML estimator proceeds in two steps:
- Partial out: Use ML to estimate $\hat{g}_{0}(x)=\hat{\mathbb{E}}[Y|X=x]$ and $\hat{m}_{0}(x)=\hat{\mathbb{E}}[D|X=x]$. Form residuals:$$\tilde{Y}_{i}=Y_{i}-\hat{g}_{0}(X_{i}), \quad \tilde{D}_{i}=D_{i}-\hat{m}_{0}(X_{i})$$
- Regress residual on residual:$$\hat{\theta}_{0}=(\sum_{i}\tilde{D}_{i}^{2})^{-1}\sum_{i}\tilde{D}_{i}\tilde{Y}_{i}$$
This is the Frisch-Waugh-Lovell theorem applied non-parametrically. By partialling out $X$ from both $Y$ and $D$, the remaining variation in $\tilde{D}$ is orthogonal to $X$, removing the confounding.
Cross-fitting: To avoid regularisation bias, $\hat{g}_{0}$ and $\hat{m}_{0}$ must be estimated on a different fold of the data than the one used to compute residuals. DML splits the data into $K$ folds and estimates on $K-1$ folds, predicting on the held-out fold. The final estimate averages over folds. This ensures the residuals are out-of-sample predictions, making them asymptotically unbiased.
3.3 Properties
Under regularity conditions (the ML nuisance functions converge to the truth at rate $n^{-1/4}$), $\hat{\theta}_{0}$ is $\sqrt{n}$ consistent and asymptotically normal:
$$(\hat{\theta}_{0} - \theta_{0}) \sim \mathcal{N}(0, V)$$
where $V$ is a variance that can be consistently estimated. This allows standard confidence intervals and t-tests.
4 A Minimal Working Example (R)
library(DoubleML)
library(mlr3)
library(mlr3learners)
set.seed(42)
# Simulate partially linear data
n <- 500; p <- 20
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("X", 1:p)
D <- 0.5 * X[,1] + rnorm(n)
Y <- 1.5 * D + X[,1] + X[,2] + rnorm(n)
# True theta_0 = 1.5
# Create DoubleML data object
data <- cbind(Y=Y, D=D, as.data.frame(X))
dml_data <- DoubleMLData$$new(data,
y_col="Y", d_cols="D",
x_cols=paste0("X", 1:p))
# Specify learners
lasso <- lrn("regr.cv_glmnet", alpha=1)
rf <- lrn("regr.ranger")
# Fit DML-PLR (partially linear regression)
dml_plr <- DoubleMLPLR$new(dml_data,
ml_l = lasso, # for Y ~ X (outcome model)
ml_m = lasso, # for D ~ X (treatment model)
n_folds = 5) [cite: 165, 166, 167, 171, 174]
dml_plr$fit()
dml_plr$summary()
# Coefficient should be close to 1.5
5 Beyond the Partially Linear Model
DoubleML supports several model classes:
- PLR (DoubleMLPLR): Partially linear regression for continuous treatment.
- PLIV (DoubleMLPLIV): Partially linear IV for endogenous continuous treatment with an instrument.
- IRM (DoubleMLIRM): Interactive regression model for binary treatment, targeting ATE or ATT.
- IIVM (DoubleMLIIVM): Interactive IV model for binary treatment with a binary instrument.
# Binary treatment: IRM for ATE
# ml_m uses a classification learner (classif.ranger) because D is binary:
# the propensity score P(D=1|X) is a probability requiring a classifier.
# ml_g uses a regression learner for the outcome model E[Y|D,X].
dml_irm <- DoubleMLIRM$new(dml_data,
ml_g = lrn("regr.ranger"),
ml_m = lrn("classif.ranger"),
score = "ATE",
n_folds = 5)
dml_irm$fit()
dml_irm$summary()
6 Key Options and Pitfalls
6.1 Choosing Learners
The DML estimator is valid for any ML learner that converges fast enough. Practical guidance:
- LASSO/Elastic net: Good when the true model is sparse. Use
lrn("regr.cv_glmnet"). - Random forest: Good for nonlinear, high-dimensional settings. Use
lrn("regr.ranger"). - Stacking (ensemble): Combining multiple learners often outperforms any single one. Use
mlr3pipelinesfor stacking.
6.2 Common Pitfalls
- Too few folds: Use at least 5-fold cross-fitting ($K=5$). With $K=2$, variance is high.
- Using the same learner for $g$ and $m$: If $Y$ and $D$ have different functional forms (e.g., $Y$ is binary, $D$ is continuous), use different learner types.
- Ignoring clustering: If observations are clustered, set
cluster_colsin the data object and use clustered standard errors.
7 Comparison to Alternatives
8 Conclusion
The DoubleML package makes the Chernozhukov et al. (2018) DML framework accessible to applied researchers. By combining flexible ML for nuisance function estimation with cross-fitting and Neyman-orthogonal score functions, it achieves valid $\sqrt{n}$ inference for treatment effects even in high-dimensional settings. For researchers working with many controls but a single or small number of treatments, DoubleML is the recommended tool.
References
- Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. Econometrics Journal, 21(1):C1-C68.
- Bach, P., Chernozhukov, V., Kurz, M.S., and Spindler, M. (2022). DoubleML—an object-oriented implementation of double machine learning in R. Journal of Statistical Software, 103(3):1-45.
- Belloni, A., Chernozhukov, V., and Hansen, C. (2014). Inference on treatment effects after selection among high-dimensional controls. Review of Economic Studies, 81(2):608-650.
- Robinson, P.M. (1988). Root-N-consistent semiparametric regression. Econometrica, 56(4):931-954.