The Causal Review

1 Motivation

Double machine learning (DML) (Chernozhukov et al., 2018) has become the leading approach for estimating causal effects with high-dimensional controls : it achieves √n-consistent, asymptotically normal inference for the treatment effect in a partially linear model Y = Dθ + g(X) + ε by using cross-fitting and Neyman orthogonality to decouple the estimation of the nuisance functions from the target parameter.

The standard DML framework handles a scalar treatment D. Many empirical questions, however, involve multiple simultaneous treatments. Consider the following examples:

Wage determination: estimating returns to both years of education D₁ and years of work experience D₂ jointly, when the two are correlated and both potentially endogenous.

Air pollution: the joint effect of PM₂.₅ (D₁), ozone (D₂) and NO₂ (D₃) on health outcomes, when pollutant concentrations are strongly correlated.

Multiple policy instruments: estimating the effect of minimum wage (D₁) and earned income tax credit (D₂) on employment, when both policies often change simultaneously.

Applying scalar DML to each treatment separately in these settings produces biased estimates, because it fails to partial out the cross-treatment correlations. This article describes the multivariate extension of DML that addresses this problem.

2 Standard DML: Scalar Treatment

Recall the partially linear regression (PLR) model:

Y = Dθ₀ + g₀(X) + ε, E[ε | D, X] = 0, (1)

D = ℓ₀(X) + ν, E[ν | X] = 0, (2)

where X ∈ ℝᵖ is a high-dimensional vector of controls, g₀ and l₀ are unknown nuisance functions, and θ₀ ∈ ℝ is the target causal parameter. DML proceeds in three steps:

Cross-fit nuisance estimation: split the data into K folds. For each fold k, estimate ĝ⁽⁻ᵏ⁾ and l̂⁽⁻ᵏ⁾ using all data except fold k. ‍
Residualise: form Ṽᵢ = Dᵢ - l̂⁽⁻ᵏ⁽ⁱ⁾⁾(Xᵢ) and Ũᵢ = Yᵢ - ĝ⁽⁻ᵏ⁽ⁱ⁾⁾(Xᵢ). ‍
Estimate: θ̂_DML = (Σᵢ Ṽᵢ²)⁻¹ Σᵢ ṼᵢŨᵢ.

The key property is Neyman orthogonality: the score function ψ(W; θ, η) = Ṽ(Ũ - Ṽθ) has zero Gateaux derivative with respect to η = (g, l) at the true values. This means that O(n⁻¹/⁴) estimation errors in the nuisance functions translate to only O(n⁻¹/²) errors in θ₀ —an order of magnitude improvement that permits the use of flexible ML estimators.

3 The Multivariate Treatment Setting

Now let D = (D₁, ..., D_K)' ∈ ℝᴷ be a K-vector of treatments. The multivariate PLR model is:

Y = D'θ₀ + g₀(X) + ε, E[ε | D, X] = 0, (3)

D_k = ℓ_0,k(X) + ν_k, E[ν_k | X] = 0, k = 1, &dots;, K. (4)

The target is θ₀ = (θ₀,₁, ..., θ₀,ₖ)' ∈ ℝᴷ—the vector of partial treatment effects. Each θ₀,ₖ is the ceteris paribus effect of Dₖ on Y, holding all other treatments constant and controlling for X.

Why scalar DML fails. Suppose one applies scalar DML to treatment D₁ alone, residualising Y and D₁ on X using ML. The residualised regression is:

U~ = V~₁θ_0,1 + (D₂ − ℓ_0,2(X))θ_0,2 + &dots; + ε. (5)

The terms (Dₖ - l₀,ₖ(X))θ₀,ₖ for k ≥ 2 remain in the residual and act as omitted variables. If Cov(Ṽ₁, Ṽₖ) ≠ 0—which is generically true when the treatments are correlated—the scalar DML estimator for θ₀,₁ is biased.

4 Multivariate DML

The solution is straightforward: residualise all K treatments jointly and run a multivariate least squares in the final step.

Algorithm.

Cross-fit nuisance estimation: for each fold k and each treatment j = 1, ..., K, estimate l̂ⱼ⁽⁻ᵏ⁾(X). Also estimate ĝ⁽⁻ᵏ⁾(X).
Residualise: form Ṽⱼ,ᵢ = Dⱼ,ᵢ - l̂ⱼ⁽⁻ᵏ⁽ⁱ⁾⁾(Xᵢ) for each j and Ũᵢ = Yᵢ - ĝ⁽⁻ᵏ⁽ⁱ⁾⁾(Xᵢ). ‍
Multivariate OLS: regress Ũ on (Ṽ₁, ..., Ṽ_K):

^θ_DML = (~V'~V)⁻¹ ~V'~U, (6)

where Ṽ is the n × K matrix of treatment residuals and Ũ is the n-vector of outcome residuals. The multivariate score is ψ(W; θ, η) = Ṽ(Ũ - Ṽ'θ). Neyman orthogonality continues to hold: the Gateaux derivative of the moment condition 𝔼[ψ] = 0 with respect to the nuisance η = (g, l₁, ..., l_K) is zero at the truth. Hence θ̂_DML inherits the same √n rate and asymptotic normality as the scalar case.

5 Asymptotic Inference

Under standard regularity conditions, θ̂_DML is asymptotically normal : with variance-covariance matrix:

√n (^θ_DML − θ₀)

d→

𝒩(0, Ω), (7)

with variance-covariance matrix:

Ω = E[V~V~']⁻¹ E[ε²V~V~'] E[V~V~']⁻¹. (8)

The sandwich estimator (8) is heteroskedasticity-robust and estimated by sample analogues. Joint hypothesis tests on θ₀ use the χ²ₖ distribution ; individual tests use the t-distribution.

6 Identifying Assumptions

Multivariate DML requires:

Strict exogeneity: 𝔼[ε | D, X] = 0. All K treatments are unconfounded given the same X. This rules out unobserved confounders of any Dₖ-Y relationship that are not captured by X.
Linear in treatments: the model is partially linear in D. Extensions to non-linear treatment effects require the interactive regression model (IRM-DML) formulation. ‍
Overlap: Var(D | X) > 0 in a matrix sense (non-degenerate treatment variation conditional on X). ‍
Nuisance rate conditions: each l̂ⱼ and ĝ converges at a rate faster than n⁻¹/⁴ in L₂-norm.

7 Application: Returns to Education and Experience

Consider a Mincer wage equation where years of schooling D₁ and years of work experience D₂ are both endogenous. In a cross-section of workers, schooling and experience are negatively correlated (more schooling fewer years of work given age). Scalar DML for schooling fails to control for the D₂-wage correlation, biasing θ₁.

Multivariate DML with K=2 treatments and X consisting of demographics, region, occupation, and industry controls—estimated via random forests or gradient boosting—partials out both the schooling-control and experience-control correlations simultaneously and delivers a consistent estimate of each partial effect.

8 Software

The DoubleML package (Bach et al., 2022) in R and Python supports multivariate treatments via the DoubleMLPLR class. Specify multiple treatment columns in the d_cols argument: ‍

dml_obj = DoubleMLPLR(data, ml_g, ml_m, d_cols=["educ", "exper"])
dml_obj.fit()
dml_obj.summary

Microsoft's EconML library also supports multivariate treatments in the LinearDML class, with additional support for heterogeneous treatment effects.

9 Conclusion

Multivariate DML extends the scalar DML framework to settings with multiple simultaneous treatments by residualising all treatments jointly and running multivariate OLS in the final step. The key insight is that scalar DML applied to each treatment separately generates omitted-variable bias when treatments are correlated. Neyman orthogonality carries through to the multivariate case, preserving √n consistency and asymptotic normality under the same rate conditions as scalar DML. As economic questions increasingly involve multiple interacting policy instruments, multivariate DML provides a principled and practically accessible solution.

References

Bach, P., Chernozhukov, V., Kurz, M. S., and Spindler, M. (2022). DoubleML—An object-oriented implementation of double machine learning in R. Journal of Statistical Software, 108(3):1-56.
Belloni, A., Chernozhukov, V., and Hansen, C. (2014). Inference on treatment effects after selection among high-dimensional controls. Review of Economic Studies, 81(2):608-650.
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. Econometrics Journal, 21(1):C1-C68.
Newey, W. K. (1994). The asymptotic variance of semiparametric estimators. Econometrica, 62(6):1349-1382.
Semenova, V. and Chernozhukov, V. (2021). Debiased machine learning of conditional average treatment effects and other causal functions. Econometrics Journal, 24(2):264-289.
Wager, S. and Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523):1228-1242.

Causal Inference with Multiple Simultaneous Treatments: Extending Double Machine Learning

1 Motivation

2 Standard DML: Scalar Treatment

3 The Multivariate Treatment Setting

4 Multivariate DML

5 Asymptotic Inference

6 Identifying Assumptions

7 Application: Returns to Education and Experience

8 Software

9 Conclusion

References

Continue Reading

The ivmte Package in R: Marginal Treatment Effects and Bounding Policy-Relevant Parameters

The contdid Package in R: Estimating Dose-Response Functions with Continuous Treatments

Recent Results: Housing Markets, Rent Control, and Urban Economics

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Article Title

Causal Inference with Multiple Simultaneous Treatments: Extending Double Machine Learning

1 Motivation

2 Standard DML: Scalar Treatment

3 The Multivariate Treatment Setting

4 Multivariate DML

5 Asymptotic Inference

6 Identifying Assumptions

7 Application: Returns to Education and Experience

8 Software

9 Conclusion

References

Continue Reading

The ivmte Package in R: Marginal Treatment Effects and Bounding Policy-Relevant Parameters

The contdid Package in R: Estimating Dose-Response Functions with Continuous Treatments

Recent Results: Housing Markets, Rent Control, and Urban Economics

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Stay current with causal inference

Article Title