1 What Problem Does DoubleML Solve?
Applied researchers frequently want to estimate the causal effect of a treatment or exposure on an outcome while controlling for a large number of covariates. When the covariate set is large relative to the sample size, standard OLS is unreliable: it overfits, and the resulting standard errors are invalid.
A common response is to use machine learning methods for model selection—LASSO, random forests, gradient boosting—to select relevant controls. But naively using ML predictions for causal inference introduces regularisation bias: LASSO, for example, deliberately shrinks coefficients toward zero, which biases the treatment effect estimate if the treatment variable is also regularised.
Double Machine Learning (DML) [Chernozhukov et al., 2018] solves this problem with two key ingredients: (1) partialling out the treatment from the outcome using residuals from ML predictions, and (2) cross-fitting to ensure the residuals are out-of-sample and free of overfitting bias.
The DoubleML package provides a production-quality implementation in both R and Python.
2 Installation
# R
install.packages("DoubleML")
library(DoubleML)
Python
# Python (pip)
# pip install doubleml
The R package requires the mlr3 ecosystem for machine learning. The Python package uses scikit-learn.
3 The DML Estimating Framework
3.1 The Partially Linear Model
The baseline DML model is the partially linear regression:
where Y is the outcome, D is the (scalar) treatment, X are high-dimensional controls, g₀ and m₀ are unknown functions estimated by ML, and εᵢ, vᵢ have zero mean conditional on Xᵢ.
The target parameter is θ₀, the causal effect of D on Y conditional on X.
3.2 The DML Estimator
The DML estimator proceeds in two steps:
- Partial out: Use ML to estimate ĝ₀(x) = Ê[Y | X = x] and m̂₀(x) = Ê[D | X = x].
Form residuals:Ỹᵢ = Yᵢ − ĝ₀(Xᵢ),D̃ᵢ = Dᵢ − m̂₀(Xᵢ) - Regress residual on residual:
θ̂₀ = (∑ D̃ᵢ²)⁻¹ ∑ D̃ᵢ Ỹᵢ
This is the Frisch–Waugh–Lovell theorem applied non-parametrically. By partialling out X from both Y and D, the remaining variation in D̃ is orthogonal to X, removing the confounding.
Cross-fitting: To avoid regularisation bias, ĝ₀ and m̂₀ must be estimated on a different fold of the data than the one used to compute residuals. DML splits the data into K folds and estimates on K−1 folds, predicting on the held-out fold. The final estimate averages over folds. This ensures the residuals are out-of-sample predictions, making them asymptotically unbiased.
3.3 Properties
Under regularity conditions (the ML nuisance functions converge to the truth at rate n⁻¹ᐟ⁴, θ̂₀ is √n-consistency and asymptotically normal:
where V is a variance that can be consistently estimated. This allows standard confidence intervals and t-tests.
4 A Minimal Working Example (R)
library(DoubleML)
library(mlr3)
library(mlr3learners)
set.seed(42)
# Simulate partially linear data
n <- 500; p <- 20
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("X", 1:p)
D <- 0.5 * X[,1] + rnorm(n)
Y <- 1.5 * D + X[,1] + X[,2] + rnorm(n)
# True theta_0 = 1.5
# Create DoubleML data object
data <- cbind(Y=Y, D=D, as.data.frame(X))
dml_data <- DoubleMLData$new(data,
y_col="Y", d_cols="D",
x_cols=paste0("X", 1:p))
# Specify learners
lasso <- lrn("regr.cv_glmnet", alpha=1)
rf <- lrn("regr.ranger")
# Fit DML-PLR (partially linear regression)
dml_plr <- DoubleMLPLR$new(dml_data,
ml_l = lasso, # for Y ~ X (outcome model)
ml_m = lasso, # for D ~ X (treatment model)
n_folds = 5) [cite: 165, 166, 167, 171, 174]
dml_plr$fit()
dml_plr$summary()
# Coefficient should be close to 1.5
5 Beyond the Partially Linear Model
DoubleML supports several model classes:
- PLR (DoubleMLPLR): Partially linear regression for continuous treatment.
- PLIV (DoubleMLPLIV): Partially linear IV for endogenous continuous treatment with an instrument.
- IRM (DoubleMLIRM): Interactive regression model for binary treatment, targeting ATE or ATT.
- IIVM (DoubleMLIIVM): Interactive IV model for binary treatment with a binary instrument.
# Binary treatment: IRM for ATE
# ml_m uses a classification learner (classif.ranger) because D is binary:
# the propensity score P(D=1|X) is a probability requiring a classifier.
# ml_g uses a regression learner for the outcome model E[Y|D,X].
dml_irm <- DoubleMLIRM$new(dml_data,
ml_g = lrn("regr.ranger"),
ml_m = lrn("classif.ranger"),
score = "ATE",
n_folds = 5)
dml_irm$fit()
dml_irm$summary()
6 Key Options and Pitfalls
6.1 Choosing Learners
The DML estimator is valid for any ML learner that converges fast enough. Practical guidance:
- LASSO/Elastic net: Good when the true model is sparse. Use
lrn("regr.cv_glmnet"). - Random forest: Good for nonlinear, high-dimensional settings. Use
lrn("regr.ranger"). - Stacking (ensemble): Combining multiple learners often outperforms any single one. Use
mlr3pipelinesfor stacking.
6.2 Common Pitfalls
- Too few folds: Use at least 5-fold cross-fitting (K=5). With K=2, variance is high.
- Using the same learner for g and m: If Y and D have different functional forms (e.g., Y is binary, D is continuous), use different learner types.
- Ignoring clustering: If observations are clustered, set
cluster_colsin the data object and use clustered standard errors.
7 Comparison to Alternatives
8 Conclusion
The DoubleML package makes the Chernozhukov et al. (2018) DML framework accessible to applied researchers. By combining flexible ML for nuisance function estimation with cross-fitting and Neyman-orthogonal score functions, it achieves valid √n-inference for treatment effects even in high-dimensional settings. For researchers working with many controls but a single or small number of treatments, DoubleML is the recommended tool.
References
- Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. Econometrics Journal, 21(1):C1-C68.
- Bach, P., Chernozhukov, V., Kurz, M.S., and Spindler, M. (2022). DoubleML—an object-oriented implementation of double machine learning in R. Journal of Statistical Software, 103(3):1-45.
- Belloni, A., Chernozhukov, V., and Hansen, C. (2014). Inference on treatment effects after selection among high-dimensional controls. Review of Economic Studies, 81(2):608-650.
- Robinson, P.M. (1988). Root-N-consistent semiparametric regression. Econometrica, 56(4):931-954.