1 The Central Tension

‍How do we predict the effects of policies that have never been tried? Two broad approaches dominate modern empirical economics, and they rest on fundamentally different epistemological premises.

The structural approach builds an economic model, estimates its deep parameters (preferences, technology, information sets) from data, and uses the model to simulate the outcomes of counterfactual policies. Because the deep parameters are assumed to be invariant to the policy, the simulation is valid even for policies far outside the range of observed variation. The structural model is explicit about mechanism and can answer welfare questions.

The machine learning approach (inclusive here of reduced-form causal inference methods such as double/debiased machine learning and causal forests) estimates heterogeneous treatment effects from quasi-experimental or experimental variation, using flexible ML algorithms to avoid parametric assumptions. It does not require a complete economic model and is more agnostic about mechanism. But its predictions are local they apply near the range of variation identified in the data.

The debate between these approaches is not new-Heckman [1997] and Angrist and Krueger [1999] disagreed about the value of local average treatment effects in the 1990s-but it has acquired new urgency as ML methods have become powerful enough to challenge structural approaches on their own turf.

2 Side A: Structural Models Are Necessary for Policy CounterfactualsThe Lucas critique

‍The deepest argument for structural modelling is the Lucas critique [Lucas, 1976]. Reduced-form relationships estimated from historical data reflect the equilibrium behaviour of agents under the prevailing policy regime. When the policy changes, agents update their behaviour, and the old reduced-form relationship breaks down. Only a model of the structural parameters governing agent behaviour which are assumed policy-invariant provides valid out-of-sample predictions.

This critique applies with particular force when the proposed policy is large (e.g., replacing the entire UI system), unfamiliar (no historical variation to identify the reduced form), or when equilibrium effects are important (changes in prices, wages, and market structure that feed back to individual behaviour).

Welfare analysis requires a model

Predicting the effect of a policy on GDP, employment, or some outcome variable may be possible from reduced-form estimates. But welfare analysis evaluating whether the policy makes people better off and by how much requires comparing utility levels in the policy-on and policy-off world. This comparison requires knowledge of preferences, which must be estimated using a structural model. A policy that raises employment by 10 percentage points but involves large distortions might be welfare-reducing; a reduced-form estimate alone cannot reveal this [Wolpin, 2013].

External validity via theory

‍Structural models provide a disciplined framework for external validity: the analyst can simulate the policy in a new context by re-solving the model with the new context's parameters, without requiring a separate natural experiment in that context. This is essential for policy analysis in developing countries or novel institutional environments where natural experiments may be rare.

3 Side B: Machine Learning Methods Provide More Credible Causal Estimates

Structural models make strong, often wrong, functional form assumptions

‍Structural economic models require the analyst to specify the functional form of utility functions, production functions, and information structures. These choices are rarely well-identified by the data; instead, they are imposed by the analyst's prior and can drive the results. Heckman [1997] himself acknowledged that "structural" need not mean "good" if the structure is badly misspecified.

ML-based causal inference methods such as Double Machine Learning [Chernozhukov et al., 2018] and causal forests [Wager and Athey, 2018] are robust to misspecification of nuisance functions because they use cross-fitting and regularisation to estimate complex conditional means without imposing parametric constraints. This flexibility means that ML estimates are less likely to be driven by functional form assumptions that cannot be tested.

Data speaks; models can lie

‍Angrist and Krueger [1999] argued that IV estimates from well-designed natural experiments answer narrow but credible causal questions: they tell us the effect of a real-world intervention on the complier population, without making assumptions about the rest of the population or about preferences. This "credibility" has enormous value even if the estimand is local: a reliable LATE is more informative than a mis-specified structural simulation that appears to answer a broader question.

The growth of causal ML tools extends this logic: causal forests allow the analyst to recover heterogeneous treatment effects across the covariate distribution non-parametrically [Wager and Athey, 2018], while DML provides valid inference on structural parameters in partially linear models without specifying the functional form of the nuisance functions [Chernozhukov et al., 2018]. Both methods use experimental or quasi-experimental variation and rely on much weaker assumptions than full structural identification.

Large-scale policy rollouts can be evaluated

‍Critics of ML methods argue that large-scale general equilibrium effects cannot be captured by local reduced-form estimates. But several recent papers have used variation across regions, time periods, or dose levels to recover general equilibrium estimates non-parametrically. Donaldson [2018] uses a structural model motivated by reduced-form IV estimates; the IV step provides the credible variation and the structural model extends it to welfare analysis a hybrid approach that draws on both traditions.

4 Points of Agreement and the Path Forward

‍The dichotomy between structural and ML approaches is partly false. The most influential recent empirical papers combine elements of both:

Reduced-form credibility for structural identification. Use well-identified IV or RD variation to estimate key structural parameters (trade elasticities, labour supply elasticities) and embed them in a structural model for counterfactual simulation. This hybrid strategy exemplified by Donaldson [2018] draws credibility from the natural experiment and extrapolation from the structural model.
‍ML for heterogeneity and nuisance. Use ML to flexibly estimate nuisance functions (conditional means, propensity scores) in a causal inference procedure, while retaining economic theory for the causal identification assumptions. This is the spirit of DML [Chernozhukov et al., 2018].‍
Structural models of limited scope. Rather than specifying a full general equilibrium model, use structural models of limited scope (a single market, a single agent decision problem) with credible identification. This limits the exposure to mis-specification while enabling welfare analysis.

Unresolved questions. The debate will not be fully resolved until: (i) structural and ML methods can be formally compared in terms of mean squared prediction error for out-of-sample counterfactuals; (ii) there are agreed standards for what "structural" identification means in the presence of equilibrium effects; and (iii) the communities share enough common vocabulary to agree on what counts as "credible" identification.

5 Conclusion

Both structural and ML approaches to policy counterfactuals have genuine strengths and genuine limitations. Structural models offer welfare analysis, external validity through theory, and the ability to handle policies outside the support of observed variation but at the cost of strong and often untestable model assumptions. ML methods offer credible, assumption-lean estimates that are robust to misspecification but their predictions are local and cannot easily address welfare or out-of-distribution questions. The most productive empirical research programmes draw pragmatically on both, using the natural experiment to provide identification and the structural model to extend the result to policy-relevant counterfactuals.

References

Angrist, J. D. and Krueger, A. B. (1999). Empirical strategies in labor economics. In O. Ashenfelter and D. Card, eds., Handbook of Labor Economics, Vol. 3A, pp. 1277-1366. Elsevier.
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. Econometrics Journal, 21(1):C1-C68.
Donaldson, D. (2018). Railroads of the Raj: estimating the impact of transportation infrastructure. American Economic Review, 108(4-5):899-934.
Heckman, J. J. (1997). Instrumental variables: a study of implicit behavioral assumptions used in making program evaluations. Journal of Human Resources, 32(3):441-462.
Lucas, R. E. (1976). Econometric policy evaluation: a critique. In K. Brunner and A. Meltzer, eds., The Phillips Curve and Labor Markets, Carnegie-Rochester Conference Series on Public Policy, Vol. 1, pp. 19-46. North-Holland.
Wager, S. and Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523):1228-1242.
Wolpin, K. I. (2013). The Limits of Inference Without Theory. MIT Press.

‍

Machine Learning versus Structural Models: Two Visions of Policy Counterfactuals

1 The Central Tension

2 Side A: Structural Models Are Necessary for Policy CounterfactualsThe Lucas critique

Welfare analysis requires a model

External validity via theory

3 Side B: Machine Learning Methods Provide More Credible Causal Estimates

Structural models make strong, often wrong, functional form assumptions

Data speaks; models can lie

Large-scale policy rollouts can be evaluated

4 Points of Agreement and the Path Forward

5 Conclusion

References

Continue Reading

The causalml Package in Python: Uplift Modeling and CATE Meta-Learners

The gsynth Package in R: Generalized Synthetic Control with Interactive Fixed Effects

Recent Results: Immigration, Migration, and Labour Markets

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Article Title

Machine Learning versus Structural Models: Two Visions of Policy Counterfactuals

1 The Central Tension

2 Side A: Structural Models Are Necessary for Policy CounterfactualsThe Lucas critique

Welfare analysis requires a model

External validity via theory

3 Side B: Machine Learning Methods Provide More Credible Causal Estimates

Structural models make strong, often wrong, functional form assumptions

Data speaks; models can lie

Large-scale policy rollouts can be evaluated

4 Points of Agreement and the Path Forward

5 Conclusion

References

Continue Reading

The causalml Package in Python: Uplift Modeling and CATE Meta-Learners

The gsynth Package in R: Generalized Synthetic Control with Interactive Fixed Effects

Recent Results: Immigration, Migration, and Labour Markets

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Stay current with causal inference

Article Title