The Causal Review

1. Athey and Wager (2021): Policy Learning with Observational Data

Citation: Athey, S. and Wager, S. (2021). Policy learning with observational data. Econometrica, 89(1):133-161.

Research question: How should a policymaker allocate a limited treatment (job training, medical intervention, subsidy) to maximise welfare, given observational data on past outcomes?

Methods: The authors propose a framework for policy learning: using estimated CATES to assign treatment to the individuals most likely to benefit. They prove that the expected welfare loss from using an estimated policy (rather than the true optimal) converges to zero at the minimax-optimal rate. Key insight: doubly-robust Augmented IPW estimators make the policy value estimation more stable, reducing the sample size required for meaningful policy learning.

Key results: In a simulation calibrated to the National JTPA Study (a job-training RCT), welfare-maximising policies based on estimated CATEs substantially outperform uniform treatment and simple demographic-based targeting. The welfare gain from personalised targeting is equivalent to treating 20-30% more individuals with no increase in budget.

Takeaway: Heterogeneous treatment effect estimation is not just an academic exercise. When combined with a policy learning framework, CATE estimates can guide resource allocation in ways that significantly improve welfare at no additional cost.

2. Chernozhukov, Newey, and Singh (2022): Automatic Debiased Machine Learning

Citation: Chernozhukov, V., Newey, W.K., and Singh, R. (2022). Automatic debiased machine learning of causal and structural effects. Econometrica, 90(3):967-1027.

Research question: Can double machine learning be extended automatically to a broad class of nonlinear causal estimands, without deriving case-specific influence functions?

Methods: The authors propose automatic DML: a general recipe for constructing Neyman-orthogonal score functions for a wide range of parameters (including average partial effects, quantile treatment effects, and local average treatment effects) using automated differentiation and Riesz representation. This extends the Chernozhukov et al. (2018) DML framework beyond the partially linear model.

Key results: Under mild rate conditions on the ML nuisance estimators (convergence rates of n⁻¹ᐟ⁴, the proposed estimators are √n-consistent and asymptotically normal. Simulations confirm near-optimal finite-sample performance across a range of data-generating processes.

Takeaway: The principle of double debiasing generalises much further than the linear model. Researchers can now apply DML-style inference to curved (nonlinear) parameters routinely, without case-by-case derivation of influence functions.

3. Semenova and Chernozhukov (2021): Debiased Machine Learning of Conditional Average Treatment Effects

Citation: Semenova, V. and Chernozhukov, V. (2021). Debiased machine learning of conditional average treatment effects and other causal functions. Econometrics Journal, 24(2):264-289.

Research question: How can we construct valid uniform confidence bands for the CATE function τ(x) when using ML for estimation?

Methods: The paper proposes a debiased ML estimator for τ(x) as a function of a low-dimensional argument. The approach combines a Neyman-orthogonal score with a series or local regression basis expansion for τ(x). Cross-fitting handles the nuisance functions. Uniform (sup-norm) confidence bands are derived, enabling tests of the form "is τ(x)>0 for all x in this region?"

Key results: In an application to a welfare programme evaluation, the estimator reveals that treatment effects are positive for low-income subgroups and near zero for high-income subgroups heterogeneity that is masked by the ATE but revealed by the CATE function.

Takeaway: This paper provides the theoretical tools for making causal statements about the CATE function (not just point estimates of τ(xᵢ), enabling rigorous heterogeneity analysis.

4. Nie and Wager (2021): Quasi-Oracle Estimation of Heterogeneous Treatment Effects

Citation: Nie, X. and Wager, S. (2021). Quasi-oracle estimation of heterogeneous treatment effects. Biometrika, 108(2):299-319.

Research question: What is the minimax-optimal rate for estimating the CATE function, and what estimator achieves it?

Methods: The authors propose the R-learner: a two-step procedure that first residualises Y and Don X using any ML method, then estimates $\tau(x)$ by minimising a weighted loss function. The R-learner is "quasi-oracle" in the sense that its convergence rate matches the rate achievable if the nuisance functions were known exactly a remarkable property shared by cross-fitted DML but derived here for the CATE specifically.

Key results: In simulations with n=5,000 and p=20 covariates, the R-learner (using gradient-boosted trees for residualisation) substantially outperforms T-learner, S-learner, and X-learner baselines in terms of CATE estimation error, especially in the presence of heterogeneous propensity scores.

Takeaway: The R-learner is a principled, efficient CATE estimator that separates the problem of estimating confounding (residualisation) from the problem of estimating effect heterogeneity (second stage). Its implementation is available in the rlearner R package and the grf toolkit.

5. Kennedy (2023): Towards Optimal Doubly Robust Estimation of Heterogeneous Causal Effects

Citation: Kennedy, E.H. (2023). Towards optimal doubly robust estimation of heterogeneous causal effects. Electronic Journal of Statistics, 17(2):3008-3049.

Research question: Can doubly-robust CATE estimators achieve the nonparametric efficiency bound, and what is the role of smoothness in the efficiency bound?

Methods: The paper derives the semiparametric efficiency bound for the CATE function and proposes an estimator based on undersmoothed local polynomial regression applied to doubly-robust pseudo-outcomes (the augmented IPW scores). The estimator achieves the bound when both the propensity score and outcome regressions converge fast enough.

Key results: Under Holder smoothness conditions on τ(x), the proposed estimator achieves the minimax-optimal rate n⁻²ˢ / (²ˢ⁺ᵈ) where s is the smoothness order and d is the dimension of x. This rate is faster than that of plug-in estimators and matches the oracle lower bound.

Takeaway: Doubly-robust pseudo-outcomes combined with local polynomial smoothing provide a theoretically optimal route to CATE estimation. This paper establishes the gold standard for future method development in heterogeneous effects estimation.

References

Athey, S. and Wager, S. (2021). Policy learning with observational data. Econometrica, 89(1):133-161.
Chernozhukov, V., Newey, W.K., and Singh, R. (2022). Automatic debiased machine learning of causal and structural effects. Econometrica, 90(3):967-1027.
Semenova, V. and Chernozhukov, V. (2021). Debiased machine learning of conditional average treatment effects and other causal functions. Econometrics Journal, 24(2):264-289.
Nie, X. and Wager, S. (2021). Quasi-oracle estimation of heterogeneous treatment effects. Biometrika, 108(2):299-319.
Kennedy, E.H. (2023). Towards optimal doubly robust estimation of heterogeneous causal effects. Electronic Journal of Statistics, 17(2):3008-3049.

Recent Results: Causal Machine Learning and Heterogeneous Treatment Effects (2022-2025)

1. Athey and Wager (2021): Policy Learning with Observational Data

2. Chernozhukov, Newey, and Singh (2022): Automatic Debiased Machine Learning

3. Semenova and Chernozhukov (2021): Debiased Machine Learning of Conditional Average Treatment Effects

4. Nie and Wager (2021): Quasi-Oracle Estimation of Heterogeneous Treatment Effects

5. Kennedy (2023): Towards Optimal Doubly Robust Estimation of Heterogeneous Causal Effects

References

Continue Reading

The ivmte Package in R: Marginal Treatment Effects and Bounding Policy-Relevant Parameters

The contdid Package in R: Estimating Dose-Response Functions with Continuous Treatments

Recent Results: Housing Markets, Rent Control, and Urban Economics

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Article Title

Recent Results: Causal Machine Learning and Heterogeneous Treatment Effects (2022-2025)

1. Athey and Wager (2021): Policy Learning with Observational Data

2. Chernozhukov, Newey, and Singh (2022): Automatic Debiased Machine Learning

3. Semenova and Chernozhukov (2021): Debiased Machine Learning of Conditional Average Treatment Effects

4. Nie and Wager (2021): Quasi-Oracle Estimation of Heterogeneous Treatment Effects

5. Kennedy (2023): Towards Optimal Doubly Robust Estimation of Heterogeneous Causal Effects

References

Continue Reading

The ivmte Package in R: Marginal Treatment Effects and Bounding Policy-Relevant Parameters

The contdid Package in R: Estimating Dose-Response Functions with Continuous Treatments

Recent Results: Housing Markets, Rent Control, and Urban Economics

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Stay current with causal inference

Article Title