Debates & Controversies

Reduced Form vs Structural Models: Do We Need Theory for External Validity?

Introduction

For the past three decades, a fault line has run through empirical economics. On one side: the credibility revolution, championed by Angrist and Pischke [2010], which holds that causal claims should rest on clean identification designs—randomised experiments, instrumental variables, regression discontinuity, difference-in-differences—rather than on untestable structural assumptions. On the other: structural economists, following Heckman [1997] and Wolpin [2013], who argue that reduced-form estimates are useful locally but cannot be used to predict the effects of policies that differ from the ones studied, without a behavioural model that can extrapolate.  

This debate goes by many names: reduced form vs structural, design-based vs model-based, "pure" causal inference vs "deep structural" inference. It is not merely academic: funding agencies, journals, and policy institutions give different weight to evidence based on each approach. This article presents the strongest version of each side.  

1 The Credibility Revolution's Case for Reduced Form

1.1 The Critique of Structural Models

The credibility critique of structural models has several prongs. First, structural models impose many functional form and distributional assumptions that are not directly testable. A discrete-choice model of educational investment may assume that utility is additively separable, that discount rates are constant, and that the distribution of unobserved ability is log-normal. Each assumption constrains the model and affects its predictions—but none can be tested without the very data they are used to generate predictions from.  

Angrist and Pischke [2010] make the specific point that structural estimates from the pre-credibility revolution often relied on exclusion restrictions that were indistinguishable from the identifying assumptions needed in reduced-form IV. If the structural model requires the same untestable assumptions as IV, while adding functional form restrictions, then IV dominates on robustness grounds.  

Second, the structural approach to identification is often circular: the model's parameters are identified precisely because the model imposes strong restrictions on the data. If those restrictions are wrong, the identified parameters are meaningless. Reduced-form methods avoid this circularity by identifying only well-defined statistical objects (treatment effect at a threshold, or for compliers) without committing to a full model.  

1.2 The Policy Relevance of LATE

Reduced-form defenders also push back on the claim that LATEs are too local to be policy-relevant. Imbens [2010] argues that a well-estimated LATE provides useful information even if it does not recover ATE: it tells us that the treatment works for the marginal complier at the instrument's margin, which is often the exact margin where policy operates.  

Consider the returns to college education. The LATE from a college proximity instrument [Card, 1995] estimates the returns for students who attend college because they live near one—students from families without college traditions, often first-generation students with high credit constraints. This is precisely the population targeted by college access programmes. The LATE is not just "locally" relevant—it is the most policy-relevant parameter for that specific policy context.  

2 The Structural Case for Theory

2.1 The Lucas Critique

Lucas [1976] famously argued that reduced-form relationships estimated under one policy regime will not hold under a new regime, because optimising agents change their behaviour in response to policy changes. A reduced-form regression of consumption on income estimated under stable policy will not correctly predict the consumption response to a large, sustained income tax change.  

Structural models are immune to the Lucas critique by design: they model the invariant parameters of preferences and technology, not regime-specific reduced forms. When the policy environment changes, the model's equilibrium changes, but the deep parameters are stable.  

Applied structural economists argue that this matters enormously for policy evaluation. Estimating the effect of a modest minimum wage increase from a natural experiment cannot inform the effect of doubling the minimum wage, because the economy's production function, labour demand curves, and substitution possibilities are operating in a very different regime. A structural model that identifies these underlying relationships can extrapolate.  

2.2 Counterfactuals and Welfare

Wolpin [2013] identifies a specific class of policy questions that are unanswerable by reduced-form methods: questions that require counterfactuals involving policies never observed. What would happen to educational outcomes if we eliminated all school fees? What would the labour market equilibrium look like if we extended compulsory schooling by two years? No natural experiment has ever been run at these scales, and extrapolation from smaller quasi-experiments requires a structural model to discipline the extrapolation.  

Similarly, welfare calculations require utility functions. A reduced-form estimate of how a cash transfer affects consumption does not tell us the welfare gain to the recipient—that requires knowing the utility function and budget constraint. Structural models allow welfare-relevant cost-benefit analysis that is impossible without a behavioural framework.  

2.3 Identification vs Credibility

Structural economists also push back on the credibility critique by distinguishing between what is identified and what is identified credibly. Reduced-form methods identify narrow parameters very credibly. Structural methods identify broader parameters less credibly—but those broader parameters are what policymakers need. The credibility of a LATE estimate is not useful if what you need is an equilibrium effect.  

Keane and Wolpin [2010] illustrates this with the returns to education. Structural estimates that account for ability sorting, selection into schooling, and the sequential nature of human capital investment yield estimates that differ systematically from IV-based LATEs. The structural estimates may be less "credible" by the standards of the credibility revolution, but they answer a different and arguably more policy-relevant question.  

3 Points of Convergence

3.1 External Validity via Structural Insight

Andresen and Huber [2019] and ? show that understanding who the compliers are in an IV design—their observable characteristics, their location in the ability distribution, their treatment-taking propensity—allows partial extrapolation from LATE toward ATE using structural insights, without requiring a full structural model. This "marginal treatment effect" (MTE) framework of Heckman and Vytlacil [2005] is precisely such a bridge: it recovers the treatment effect as a function of unobserved heterogeneity and allows targeted extrapolation.  

3.2 Structural Models for External Validity of Reduced-Form Results

Todd and Wolpin [2006] provide a systematic comparison of structural and reduced-form estimates for the PROGRESA conditional cash transfer programme in Mexico. They show that a structural model calibrated on pre-programme data can predict the programme's effects on school enrolment. When the model predictions match the experimental evidence, confidence in both the model and the extrapolation increases. When they diverge, the discrepancy highlights which structural assumptions are wrong.  

This approach—using structural models to assess the external validity of reduced-form estimates—is more modest than claiming structural models are independently credible, and more productive than dismissing structural models entirely.  

3.3 The Rise of Hybrid Methods

Double machine learning [Chernozhukov et al., 2018] and causal forests [Wager and Athey, 2018] represent a middle path: they are model-free in the sense of not imposing parametric functional forms, but they explicitly model heterogeneity and can incorporate structural constraints as regularisation. They achieve the flexibility of non-parametric estimation while respecting the identifying structure of a design-based strategy.  

4 What Empirical Evidence Would Resolve the Debate?

The debate ultimately turns on empirical questions about when structural assumptions are approximately correct, when LATEs generalise to ATEs, and when general equilibrium effects invalidate reduced-form estimates. Several types of evidence are informative:  

  • Within-study comparisons (LaLonde-style): compare structural and reduced-form estimates against a randomised benchmark [LaLonde, 1986]. This tests whether structural models reliably reproduce experimental estimates.  
  • Replication across contexts: if a reduced-form estimate of a job training programme replicates across many countries and time periods, this strengthens belief in external validity without structural modelling.  
  • Equilibrium tests: look for cases where general equilibrium effects have been documented, and check whether they are large enough to invalidate reduced-form estimates [Heckman, 1998].  

5 Conclusion

The reduced-form/structural debate is not a binary choice between rigour and relevance. Reduced-form methods provide the most credible estimates of specific, locally-defined causal effects. Structural methods extend the reach of inference to novel counterfactuals and welfare analysis, at the cost of stronger assumptions. The empirically most productive strategy is typically to use both: establish a credible reduced-form baseline, then use structural modelling to assess external validity and support extrapolation. Neither approach alone is sufficient; together, they form a more complete evidence base for policy.  

References

  1. Andresen, M. E. and Huber, M. (2019). Instrument-based estimation with binarized treatments: Issues and tests for the exclusion restriction. Econometrics Journal, 24(3), 536-558.  
  2. Angrist, J. D. and Pischke, J.-S. (2010). The credibility revolution in empirical economics: How better research design is taking the con out of econometrics. Journal of Economic Perspectives, 24(2), 3-30.  
  3. Card, D. (1995). Using geographic variation in college proximity to estimate the return to schooling. In L. N. Christofides, E. K. Grant, and R. Swidinsky (Eds.), Aspects of Labour Market Behaviour. University of Toronto Press.  
  4. Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. Econometrics Journal, 21(1), C1-C68.  
  5. Heckman, J. J. (1997). Instrumental variables: A study of implicit behavioral assumptions used in making program evaluations. Journal of Human Resources, 32(3), 441-462.  
  6. Heckman, J. J., Lochner, L., and Taber, C. (1998). General-equilibrium treatment effects: A study of tuition policy. American Economic Review (P&P), 88(2), 381-386.  
  7. Heckman, J. J. and Vytlacil, E. (2005). Structural equations, treatment effects, and econometric policy evaluation. Econometrica, 73(3), 669-738.  
  8. Imbens, G. W. (2010). Better LATE than nothing: Some comments on Deaton (2009) and Heckman and Urzua (2009). Journal of Economic Literature, 48(2), 399-423.  
  9. Keane, M. P. and Wolpin, K. I. (2010). The role of labor and marriage markets, preference heterogeneity, and the welfare system in the life cycle decisions of black, hispanic, and white women. International Economic Review, 51(3), 851-892.  
  10. LaLonde, R. J. (1986). Evaluating the econometric evaluations of training programs with experimental data. American Economic Review, 76(4), 604-620.  
  11. Lucas, R. E. (1976). Econometric policy evaluation: A critique. Carnegie-Rochester Conference Series on Public Policy, 1, 19-46.  
  12. Todd, P. E. and Wolpin, K. I. (2006). Assessing the impact of a school subsidy program in Mexico: Using a social experiment to validate a dynamic behavioral model of child schooling and fertility. American Economic Review, 96(5), 1384-1417.  
  13. Wager, S. and Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523), 1228-1242.  
  14. Wolpin, K. I. (2013). The Limits of Inference Without Theory. MIT Press.  

Continue Reading

Browse All Sections →
Home
This is some text inside of a div block.
This is some text inside of a div block.
This is some text inside of a div block.

Article Title