1 The Problem of Hidden Confounding
Every observational study rests on an unverifiable assumption: conditional on the observed covariates, treatment assignment is independent of potential outcomes. This assumption— variously called "selection on observables," "conditional independence," or "ignorability"— may be plausible but it is never testable from the data alone. A hidden confounder, unmea- sured in the data, could explain away any finding. The traditional response to this problem has been to add more controls. But adding controls addresses only observed sources of confounding; it cannot rule out unobserved con- founders. Two tools have been developed to quantify how much hidden confounding would be needed to overturn a study's conclusions: Rosenbaum bounds for matched observa- tional studies [Rosenbaum, 2002] and Oster's delta for regression-based analyses [Oster, 2019]. This article explains both methods and illustrates their use.
2 Rosenbaum Bounds
2.1 Setup
Suppose we have a matched observational study: n matched pairs (iT, iC), where iT received treatment and iC is a matched control. Let ui be an unmeasured binary covariate. If treatment assignment were fully random within pairs, then within each matched pair, each unit would have a 50 per cent probability of being treated. Rosenbaum's sensitivity analysis asks: how much would treatment odds have to differ be- tween units in the same pair for the study's conclusions to be invalidated? He parameterises hidden confounding through a single sensitivity parameter Γ ≥ 1 When Γ = 1, treatment assignment is random within matched pairs. When Γ > 1, a hidden confounder could make one unit up to Γ times more likely to be treated than the other, even after matching. Formally, for units i and j in the same matched pair, Rosenbaum's model allows:
where πi = Pr(Di = 1 | Xi, ui) is the treatment probability for unit i.
2.2 Computing the Bounds
For each value of Γ, the researcher computes the range of p-values that are consistent with the data given that level of hidden confounding. Specifically, for a Wilcoxon signed-rank test on matched pair outcomes, the test statistic T has an asymptotic distribution that depends on Γ. The sensitivity bound at level Γ is the worst-case p-value:
The researcher reports p(Γ) for increasing values of Γ, and identifies the break-even Γ* at which p(Γ*) = 0.05: if a hidden confounder made treated units up to Γ* times more likely to receive treatment, the study's conclusion would no longer hold at the 5 per cent level. A study with Γ* = 1.5 is fragile: a relatively modest hidden confounder could explain the results. A study with Γ* = 5 is robust: a very powerful hidden confounder—one that sextuples the odds of treatment—would be required to overturn the finding.
2.3 R Implementation
The sensitivitymw and rbounds packages in R implement Rosenbaum bounds. A simple workflow:
library(sensitivitymw)
# matched_pairs: matrix with treated outcomes in column 1,
# control outcomes in column 2
# Each row is a matched pair
# Compute sensitivity bounds for Gamma = 1 to 5
gammas <- seq (1, 5, by = 0.5)
pvals <- sapply (gammas, function(g)
senmw (matched_pairs, gamma = g, method = "t")$pval)
data.frame(Gamma = gammas, pvalue = round (pvals, 4)) [cite: 460-464]
3 Oster's Delta
3.1 Motivation
Rosenbaum bounds apply to matched studies. For regression-based analyses—the majority of observational work in economics—a different tool is needed. The insight behind Oster [2019] is that movements in the coefficient estimate and in R² as controls are added to a regression carry information about the likely bias from unobserved confounders. Consider the structural equation:
where Xi are observed controls and Wi are unobserved controls. Oster derives conditions under which the movement of β from a regression without X ("short regression") to a regression with X ("long regression") can bound the bias from omitting W.
3.2 The Key Formula
Define:
- β~: coefficient on D in the short regression (no controls)
- β·: coefficient on D in the long regression (with X)
- R²~: R² of the short regression
- R²·: R² of the long regression
- R²max: the hypothetical R² of the fully specified model including W (often set to 1 or 1.3 times R²·)
- δ: relative degree of selection on unobservables versus observables
The identified set for β under the assumption that unobservables are at most δ times as important as observables in explaining D is characterised by the formula:
Oster's primary use of this formula is backward: rather than imposing δ, she asks what value of δ would make β* = 0 This breakdown value δ* answers: "How important would the unobserved confounders need to be, relative to the observed confounders, to entirely explain away the estimated effect?" A δ* > 1 means that unobservables would need to be more important than observables—which, Oster argues, is often implausible if the researcher has included rich observable controls.
3.3 Assumptions
The delta formula relies on a key proportionality assumption: conditional on treatment, unobservables affect the outcome with the same pattern of covariance with treatment as ob- servables do, scaled by δ. This is not testable. However, it provides a useful parameterisation that converts an untestable "no hidden confounding" assumption into the testable question: "is δ* > 1?"
3.4 R Implementation
The psacalc package in Stata (and a port to R) implements the Oster delta calculation:
# Manually in R, using the Oster (2019) formula
beta_short <- 0.45 # coefficient without controls
beta_long <- 0.28 # coefficient with controls
R2_short <- 0.12
R2_long <- 0.35
R2_max <- 1.3 * R2_long # Oster's suggested default
# Breakdown value: solve beta_star = 0 for delta
delta_star <- (beta_long / (beta_short - beta_long)) * ((R2_long - R2_short) / (R2_max - R2_long))
cat("Breakdown delta", round (delta_star, 3), "\n") [cite: 493-507]
4 When to Use Which Method
Neither method guarantees robustness; both parameterise how large hidden confounding would need to be. When the break-even value is implausibly large— Γ* = 4 for Rosenbaum, δ* = 3 for Oster—the study's conclusions can be defended as robust to hidden confounding. When the break-even is small, the researcher should report the finding with appropriate caution.
5 A Worked Example
Suppose a researcher estimates the effect of a job training programme on wages using propensity-score matching. The matched ITT estimate is β^ = 0.15 log-points, with a Wilcoxon test p-value of 0.02. A Rosenbaum sensitivity analysis reveals that the p-value remains below 0.05 up to Γ = 2.1: a hidden confounder would need to double the odds of programme participation to explain away the wage effect. Since the programme was targeted at unemployed individuals with observable characteristics quite similar to the matched con- trols, a hidden confounder of that magnitude seems implausible—the researcher can report the result as relatively robust. Separately, a regression of wages on programme participation, controlling for age, edu- cation, and prior earnings, yields β· = 0.12 (compared to β~ = 0.25 without controls), with R²· = 0.48 and R~2 = 0.15. Setting R²max = 1.3 × 0.48 = 0.624, the Oster delta calculation gives δ* ≈ 0.72. This is below 1, meaning that even a moderate degree of selection on un- observables could explain the coefficient. The regression-based estimate is thus less robust than the matching estimate, highlighting the value of the matching design.
6 Conclusion
Rosenbaum bounds and Oster's delta are not competing methods—they are complementary tools for different research designs. Together they represent the state of the art in sensitivity analysis for observational causal inference, converting the untestable assumption of no hidden confounding into a testable question about how large confounding would need to be. Every observational paper should report these diagnostics alongside its main estimates.
References
- Imbens, G. W. Sensitivity to exogeneity assumptions in program evaluation. American Economic Review Papers and Proceedings, 93(2):126-132, 2003.
- Oster, E. Unobservable selection and coefficient stability: Theory and evidence. Journal of Business & Economic Statistics, 37(2):187-204, 2019.
- Rosenbaum, P. R. Observational Studies, 2nd ed. Springer, New York, 2002.
- Rosenbaum, P. R. and Rubin, D. B. The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1):41-55, 1983.
- Rosenbaum, P. R. Design of Observational Studies. Springer, New York, 2010.