1 Introduction
The potential outcomes framework gives us clean estimands—average treatment effects, treatment effects on the treated, local average treatment effects—but these estimands do not always answer the questions that policymakers care about. When a government considers expanding a college subsidy programme, it does not want to know the effect for the entire population, nor the effect for those who would attend college regardless. It wants to know the effect for those who would be induced to attend by the subsidy—the marginal entrants. This is precisely what the marginal treatment effect (MTE), developed by Heckman and Vytlacil [2005], is designed to reveal. The MTE framework unifies the seemingly disparate world of instrumental variables (IV), matching, and structural selection models. It shows that estimands such as the average treatment effect (ATE), the average treatment effect on the treated (ATT), and the local average treatment effect (LATE) are all weighted averages of the same underlying object: the MTE function. Different identification strategies and different policies simply place different weights on that function. This article explains the MTE framework from first principles, shows how it connects to the estimands economists already use, and discusses how recent advances in partial identification—particularly Mogstad et al. [2018]—have made the framework practically accessible.
2 The Selection Model Setup
Consider a binary treatment D ∈ {0, 1} (e.g. attending college) and a continuous instrument Z (e.g. proximity to a college). The potential outcomes are Y(1) and Y(0), and the observed outcome is Y = DY(1) + (1-D)Y(0).
Heckman and Vytlacil [2005] adopt a threshold-crossing model for treatment selection:
where ν(·) is the propensity score p(X, Z) = Pr(D = 1 | X, Z), and UD is an unobserved cost-benefit index for treatment, normalised to be uniformly distributed on [0, 1] conditional on X. Individuals with UD ≤ p(X, Z) select into treatment. The key insight is that UD captures the individual's unobserved tendency to select into treatment. A low UD corresponds to someone who would take treatment even when the instrument is weak (an "always-taker" in the Angrist-Imbens-Rubin sense); a high UD corresponds to a "never-taker."
3 Defining the MTE
The marginal treatment effect at propensity score value u is:
This is the average treatment effect for individuals with observed covariates X = x who are exactly indifferent between treatment and non-treatment at a propensity score of u—in other words, the individuals who would be just induced into treatment if p were raised to u. The MTE function MTE(x, u) traces out how treatment effects vary with individuals' unobserved propensity to select into treatment. A downward-sloping MTE (as u increases, the MTE falls) means that individuals with high unobserved resistance to treatment have lower gains—a pattern consistent with positive selection, where those most likely to be treated benefit most.
3.1 Identification of the MTE
Under the standard IV assumptions—independence of (Y(1), Y(0), UD) and Z conditional on X, and the monotonicity/support condition—the MTE is identified as a derivative:
This is the local IV (LIV) estimator of Heckman and Vytlacil [1999]. Intuitively, as the propensity score moves from p to p + dp, the individuals newly induced into treatment are those with UD ∈ (p, p + dp); the response of E[Y] to that marginal shift identifies the treatment effect for those individuals.
In practice, one estimates a flexible regression of Y on X and p(X, Z)—usually a polynomial or a series estimator—and then differentiates with respect to p.
4 How LATE, ATE, and ATT Relate to the MTE
The key unifying result of Heckman and Vytlacil [2005] is that virtually every treatment effect estimand can be written as a weighted integral of the MTE:
where ωj(·) are weights that differ by estimand j. Table 1 summarises the three main cases.
5 The Policy-Relevant Treatment Effect
For policy analysis, Heckman and Vytlacil [2005] define the policy-relevant treatment effect (PRTE):
The PRTE answers: for a specific policy that shifts the propensity score from one distribution to another, what is the average effect per net person induced into treatment? This is critical because LATE answers a question about the existing instrument, not the policy under evaluation. A college proximity instrument identifies treatment effects for people induced by proximity—a group that may differ substantially from those who would be induced by, say, a tuition subsidy. The MTE framework makes this explicit: different policies correspond to different weighting functions ωj, and whether an observed LATE approximates the PRTE depends on how similar those weight functions are.
6 Partial Identification via the MTE: Mogstad, Santos, and Torgovitsky
A limitation of the classical MTE approach is that MTE(x,u) is identified only over thesupport of the propensity score—where the instrument provides variation. For u valuesoutside that support, the MTE is unidentified.Mogstad et al. [2018] show how to use the MTE framework for partial identification of policy-relevant parameters when the instrument’s support is limited. Their key insight isthat identified IV moments (IV slopes, OLS slopes, second moments) all constrain the sameunderlying MTE function. By imposing weak shape restrictions on the MTE—monotonicityin u, or an upper/lower bound on the treatment effect magnitude—they can construct sharpbounds on the PRTE for any hypothetical policy.The practical implication is striking: even when the instrument’s support covers onlypart of [0,1], one can still bound the effect of a policy that shifts propensity scores across thefull distribution. The bounds tighten as the instrument’s support widens or as more shaperestrictions are imposed.
6.1 The ivmte Package
The ivmte package in R, developed by Mogstad and Torgovitsky [2018], implements the Mogstad-Santos-Torgovitsky approach. The user specifies
- Basis functions for E[Y (0) | X,UD = u] and E[Y(1) | X,UD = u]I
- V-like moments to match (OLS, IV slopes, second moments)
- A target parameter (ATE, ATT, LATE, or PRTE)
- Optional shape restrictions (monotonicity, non-negativity)
The package then solves a linear programme to compute sharp bounds on the target parameter.
7 Empirical Applications
7.1 Returns to Schooling
The canonical application is returns to schooling. Carneiro et al. [2011] apply the MTEframework to U.S. data using college proximity and local labour market conditions as instruments. They find a downward-sloping MTE in u: individuals most likely to attendcollege (low UD) have the highest returns, and individuals at the margin of attendance havereturns close to the opportunity cost of schooling. This finding has important welfare implications: because the marginal attendees benefit least, expanding college enrolment hasdiminishing returns—at least as measured by earnings
7.2 Charter Schools
Walters [2018] applies MTE methods to charter school choice in the U.S., using charterlottery offers as instruments. He finds that students who are most resistant to attendingcharter schools (high UD) have the largest gains from attending them—a pattern of negativeselection on gains, opposite to the college case. Lottery-based LATE estimates are thusrepresentative of the full distribution of effects in this setting, unlike in many others.
8 The MTE and Heterogeneous Treatment Effects
The MTE is one of several approaches to understanding treatment effect heterogeneity. It differs from the causal forest approach [Wager and Athey, 2018] in that it models heterogeneity along the selection margin—the UD axis—rather than along observable covariates alone. The two approaches are complementary: a researcher who suspects that selection intotreatment is correlated with treatment effect gains should think carefully about the MTE;a researcher who suspects observable heterogeneity is the main driver might prefer causal forests.
The "R-learner" of Nie and Wager [2021] and the DML framework of Chernozhukov et al. [2018] estimate conditional average treatment effects τ(x) = E[Y(1) - Y(0) | X = x], which integrate over UD conditional on X. by contrast, conditions on both X and UD,and is therefore richer but also requires stronger instrument-based identification.
9 Practical Guidance
- Use MTE when: You have a valid instrument but suspect the LATE is not policy-relevant; you want to extrapolate from the complier range.
- Use standard LATE when: The instrument variation closely mimics the policy of interest.
Conclusion
The marginal treatment effect framework is one of the most intellectually rich contributions to causal inference econometrics. It reveals that the three major IV estimands—ATE, ATT, LATE—are not competing answers to the same question but are instead answers to different questions, derived from different weighting of a single MTE function. For policy analysis, the PRTE concept clarifies what we can and cannot learn from existing instruments about hypothetical interventions. The practical toolkit has matured considerably. The ivmte package makes the Mogstad-Santos-Torgovitsky partial identification approach accessible. For researchers using IV estimates to advise on policy, the MTE framework provides an honest accounting of what the data can and cannot say.
References
- Carneiro, P., Heckman, J. J., and Vytlacil, E. J. Estimating marginal returns to education. American Economic Review, 101(6):2754-2781, 2011.
- Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21(1):C1-C68, 2018.
- Heckman, J. J. and Vytlacil, E. Local instrumental variables and latent variable models for identifying and bounding treatment effects. Proceedings of the National Academy of Sciences, 96(8):4730-4734, 1999.
- Heckman, J. J. and Vytlacil, E. Structural equations, treatment effects, and econometric policy evaluation. Econometrica, 73(3):669-738, 2005.
- Imbens, G. W. and Angrist, J. D. Identification and estimation of local average treatment effects. Econometrica, 62(2):467-475, 1994.
- Mogstad, M., Santos, A., and Torgovitsky, A. Using instrumental variables for inference about policy relevant treatment parameters. Econometrica, 86(5):1589-1619, 2018.
- Mogstad, M. and Torgovitsky, A. Identification and extrapolation of causal effects with instrumental variables. Annual Review of Economics, 10:577-613, 2018.
- Nie, X. and Wager, S. Quasi-oracle estimation of heterogeneous treatment effects. Biometrika, 108(2):299-319, 2021.
- Wager, S. and Athey, S. Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523):1228-1242, 2018.
- Walters, C. R. The demand for effective charter schools. Journal of Political Economy, 126(1):103-159, 2018.