The Causal Review

1 The Problem with Point Identification

Standard causal inference strategies aim at point identification: a set of assumptions under which the average treatment effect (or some other estimand) is uniquely determined by the distribution of observables. Unconfoundedness says that treatment is as good as randomly assigned conditional on covariates. The exclusion restriction in IV says the instrument affects outcomes only through treatment. Parallel trends in DiD says treated and control groups would have followed the same time path absent treatment.

These assumptions are often plausible, sometimes implausible, and almost always untestable. What happens when we refuse to make them? Manski [1990] and the literature he founded answer this question rigorously: the data alone, without identifying assumptions, typically do not pin down the treatment effect to a single number. But they do restrict it to a set an identification region. This is partial identification.

2 The No-Assumptions Bound

Let Y_i(0) and Y_i(1) denote potential outcomes under control and treatment respectively, with Y_i ∈ [y_L, y_U] bounded. The average treatment effect is τ = E[Y_i(1) - Y_i(0)].

With a binary treatment D_i ∈ {0, 1} and i.i.d. data, the distribution of observables identifies:

$$\mathbb{E}[Y_i(1)] = \mathbb{E}[Y_i|D_i = 1] \Pr(D_i = 1) + \mathbb{E}[Y_i(1)|D_i = 0] \Pr(D_i = 0),$$

(1)

$$\mathbb{E}[Y_i(0)] = \mathbb{E}[Y_i|D_i = 0] \Pr(D_i = 0) + \mathbb{E}[Y_i(0)|D_i = 1] \Pr(D_i = 1).$$

(2)

The terms E[Y_i(1) | D_i = 0] and E[Y_i(0) | D_i = 1] are counterfactual quantities not observed in the data. Without assumptions, they can take any value in [y_L, y_U]. Substituting the extreme values yields the Manski [1990] no-assumptions bounds on the ATE:

$$\tau \in \left[ \mathbb{E}[Y_i|D_i = 1] \Pr(D_i = 1) + y_L \Pr(D_i = 0) - \mathbb{E}[Y_i|D_i = 0] \Pr(D_i = 0) - y_U \Pr(D_i = 1), \, \mathbb{E}[Y_i|D_i = 1] \Pr(D_i = 1) + y_U \Pr(D_i = 0) - \mathbb{E}[Y_i|D_i = 0] \Pr(D_i = 0) - y_L \Pr(D_i = 1) \right]$$

(3)

The width of this interval is (y_U - y_L) regardless of the data the full range of the outcome. If the outcome is binary (y_L = 0, y_U = 1), the bound always has width 1 and always contains zero, so the no-assumptions bound alone cannot conclude that the treatment effect is positive.

A Numerical Example

Suppose a job training programme (D = 1 vs D = 0) has the following observed data: Pr(D = 1) = 0.4, E[Y | D = 1] = 0.7 (employment rate among trainees), E[Y | D = 0] = 0.5 (employment rate among non-participants), and Y ∈ {0, 1}. Then:

Lower bound = 0.7 × 0.4 + 0 × 0.6 - 0.5 × 0.6 - 1 × 0.4 = 0.28 - 0.30 - 0.40 = -0.42

Upper bound = 0.7 × 0.4 + 1 × 0.6 - 0.5 × 0.6 - 0 × 0.4 = 0.28 + 0.60 - 0.30 = 0.58

The data are consistent with a treatment effect anywhere in [-0.42, 0.58]. Wide, but informative: for instance, we can rule out effects below -0.42 or above 0.58 without any assumptions.

3 Tightening the Bounds with Weak Assumptions

Manski [1990] shows that economically motivated restrictions weaker than full unconfoundedness-can narrow the identification region substantially.

Monotone Treatment Response (MTR)

The MTR assumption states that treatment can only help (or only harm): Y_i(1) ≥ Y_i(0) for all i. This is a plausible prior for programmes like education, job training, and medical treatment. Under MTR, the lower bound on the ATE tightens to 0 (since no individual is harmed by treatment). The upper bound is unchanged. Combined with the no-assumptions upper bound:

$$\tau \in [0, \, \mathbb{E}[Y_i|D_i = 1] \Pr(D_i = 1) + y_U \Pr(D_i = 0) - \mathbb{E}[Y_i|D_i = 0] \Pr(D_i = 0) - y_L \Pr(D_i = 1)].$$

(4)

Monotone Treatment Selection (MTS)

The MTS assumption states that treated individuals have weakly higher potential outcomes than untreated ones: E[Y_i(d) | D_i = 1] ≥ E[Y_i(d) | D_i = 0] for d ∈ {0, 1}. This is a one-sided selection assumption: self-selection into treatment is positive. Under MTS, the upper bound on E[Y_i(0)] tightens:

$$\mathbb{E}[Y_i(0)] \le \mathbb{E}[Y_i | D_i = 0],$$

(5)

since the potential control outcome among non-participants is an upper bound for the population average. Combining MTR and MTS yields substantially tighter bounds than either assumption alone [Manski and Pepper, 2000].

4 IV Bounds

When a binary instrument Z_i is available (satisfying independence and exclusion restriction, but not necessarily first-stage monotonicity), the Manski [1990] approach yields IV bounds that are tighter than the no-assumptions bound. The key insight is that the instrument partitions the population in a way that restricts the counterfactual distribution.

Balke and Pearl [1997] derived the sharp IV bounds for binary outcome, binary treatment, and binary instrument the tightest possible bounds consistent with the observed distribution of (Y, D, Z) and the IV restrictions. These bounds, sometimes called Balke-Pearl bounds, are:

τ ∈ [ max z ∈ {0,1} E[Y | Z = z] − 1, min z ∈ {0,1} E[Y | Z = z] + 1 ] (simplified version). (6)

The exact sharp bounds are more complex, involving the joint distribution of (Y, D) at each value of Z, but they are computable from the data. Mogstad et al. [2018] extend the sharp IV bound approach to continuous instruments and general marginal treatment effects using the IV-like estimand framework, providing a computational algorithm that delivers the sharp identified set for any linear functional of the MTE.

5 Inference on Identified Sets

A fundamental shift from standard econometrics: when an assumption is uncertain, the uncertainty propagates into the width of the identified interval, not into the precision of a point estimate. Imbens and Manski [2004] and subsequently Stoye [2009], Romano and Shaikh [2010], and Tamer [2010] develop confidence regions for identified sets. Two quantities of interest are:

Confidence interval for the identified set. A set CS_n such that Pr(identified set ⊂ CS_n) ≥ 1 - α.
‍Confidence interval for the true parameter. A set CI_n such that Pr(τ ∈ CI_n) ≥ 1 - α uniformly over all data-generating processes compatible with the maintained assumptions.

The second is the more standard object reported in applied work. It is wider than the first, and inference is conservative in the sense that coverage exceeds the nominal level over a large class of alternatives.

6 Applications

Returns to schooling. Manski and Pepper [2000] applied MTR and MTS bounds to the returns to schooling. Even without instruments, the MTR assumption (education cannot reduce earnings) and MTS (those who choose more education would have higher earnings regardless) together imply meaningful upper and lower bounds on the return to an additional year of education.

Treatment effects under noncompliance. In a randomised experiment with one-sided non-compliance (some assigned to treatment do not take it, but none assigned to control do), Manski [1990] bounds on the ATE are substantially tighter than in the observational case because the randomised assignment itself restricts the counterfactual distribution.

Regression discontinuity with treatment effect heterogeneity. Dong and Lewbel [2015] extends partial identification methods to the RDD context, deriving bounds on the average treatment effect for the full population using the local ATE at the cutoff and assumptions about effect heterogeneity away from the threshold.

7 Relation to Sensitivity Analysis

Partial identification and sensitivity analysis are closely related. In the Rambachan-Roth [Rambachan and Roth, 2023] honest DiD framework, the identified set for the treatment effect widens as the researcher relaxes the parallel trends assumption. This is exactly the partial identification logic: as an assumption weakens, the identified region expands. The break-even value the assumption strength at which the identified set first includes zero-corresponds to the breakdown value in the partial identification literature [Manski, 1990].

8 Available Software

The partialCI package in R implements Manski bounds for binary outcomes. The ivmte package (Mogstad et al. 2018) computes sharp IV bounds under marginal treatment effect assumptions. The HonestDiD package implements sensitivity bounds in the DiD context.

9 Conclusion

Partial identification provides a disciplined framework for learning from data when identifying assumptions are uncertain or contested. By making the assumption-dependence of causal conclusions transparent replacing a single point estimate with an interval that shrinks as assumptions strengthen the approach enforces intellectual honesty about what the data can and cannot tell us. The trade-off is clear: weaker assumptions give wider bounds; stronger assumptions give tighter estimates. Whether a point estimate or an honest range better serves scientific communication depends on context, but the tools for performing both are now mature and accessible.

References

Balke, A. and Pearl, J. (1997). Bounds on treatment effects from studies with imperfect compliance. Journal of the American Statistical Association, 92(439):1171-1176.
Dong, Y. and Lewbel, A. (2015). Identifying the effect of changing the policy threshold in regression discontinuity models. Review of Economics and Statistics, 97(5):1172-1185.
Imbens, G. W. and Manski, C. F. (2004). Confidence intervals for partially identified parameters. Econometrica, 72(6):1845-1857.
Manski, C. F. (1990). Nonparametric bounds on treatment effects. American Economic Review: Papers and Proceedings, 80(2):319-323.
Manski, C. F. and Pepper, J. V. (2000). Monotone instrumental variables: with an application to the returns to schooling. Econometrica, 68(4):997-1010.
Mogstad, M., Santos, A., and Torgovitsky, A. (2018). Using instrumental variables for inference about policy relevant treatment parameters. Econometrica, 86(5):1589-1619.
Rambachan, A. and Roth, J. (2023). A more credible approach to parallel trends. Review of Economic Studies, 90(5):2555-2591.
Romano, J. P. and Shaikh, A. M. (2010). Inference for the identified set in partially identified econometric models. Econometrica, 78(1):169-211.
Stoye, J. (2009). More on confidence intervals for partially identified parameters. Econometrica, 77(4):1299-1315.
Tamer, E. (2010). Partial identification in econometrics. Annual Review of Economics, 2:167-195.[cite: 19]

‍

Partial Identification and Manski Bounds: How Much Can We Learn Without Strong Assumptions?

1 The Problem with Point Identification

2 The No-Assumptions Bound

A Numerical Example

3 Tightening the Bounds with Weak Assumptions

Monotone Treatment Response (MTR)

6 Applications

7 Relation to Sensitivity Analysis

8 Available Software

9 Conclusion

References

Continue Reading

The causalml Package in Python: Uplift Modeling and CATE Meta-Learners

The gsynth Package in R: Generalized Synthetic Control with Interactive Fixed Effects

Recent Results: Immigration, Migration, and Labour Markets

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Article Title

Partial Identification and Manski Bounds: How Much Can We Learn Without Strong Assumptions?

1 The Problem with Point Identification

2 The No-Assumptions Bound

A Numerical Example

3 Tightening the Bounds with Weak Assumptions

Monotone Treatment Response (MTR)

6 Applications

7 Relation to Sensitivity Analysis

8 Available Software

9 Conclusion

References

Continue Reading

The causalml Package in Python: Uplift Modeling and CATE Meta-Learners

The gsynth Package in R: Generalized Synthetic Control with Interactive Fixed Effects

Recent Results: Immigration, Migration, and Labour Markets

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Stay current with causal inference

Article Title