New Methods & Techniques

The Generalised Propensity Score: Causal Inference with Continuous Treatments

1 Motivation: Why Binary Methods Fall Short

The propensity score of Rosenbaum and Rubin [1983] is one of the most influential tools in causal inference for observational studies. For binary treatments (T ∈ {0,1}), the propensity score e(X) = Pr(T = 1 | X) summarises the role of observed covariates X in treatment assignment, enabling dimension reduction and doubly robust estimation. Under unconfoundedness (T ⊥ Y(t) | X for t ∈ {0,1}), the propensity score suffices for identification: balancing the distribution of e(X) between treated and control groups is equivalent to balancing the full covariate distribution.

Many treatments, however, are not binary. Exposure to air pollution, years of education, doses of a medication, and trade exposure all vary continuously. For such treatments, the binary propensity score is inapplicable, and researchers have often resorted to ad hoc discretisation—dividing units into quintiles or above/below-median groups—losing information and introducing arbitrary choices.

The generalised propensity score (GPS), introduced by Hirano and Imbens [2004] and developed further by Imai and van Dyk [2004], extends the propensity score logic to continuous treatments. This article introduces the GPS framework, states the key identifying assumptions, describes the average dose-response function (ADRF), and discusses estimation and implementation.

2 Setup and Notation

Let Tᵢ ∈ T ⊆ ℝ be a continuously distributed treatment for unit i, Xᵢ ∈ ℝᵏ a vector of pre-treatment covariates, and Yᵢ(t) the potential outcome under treatment t. Assume the treatment has a conditional density fₜ|ₓ(t|x) that is positive for all (t, x) in the relevant support.

The object of interest is the average dose-response function (ADRF):

μ(t) = E[Yi(t)] t ∈ T (1)

This traces out the expected potential outcome as a function of the treatment dose, without the constraint that the comparison is only between two values. The average causal effect at dose t relative to dose t₀ is then:

τ(t, t0) = μ(t) - μ(t0) (2)

3 The Generalised Propensity Score

Definition (GPS). The generalised propensity score is the conditional density of the treatment given covariates:

Ri = fT|X(Ti | Xi) (3)

The GPS is a scalar function of (Tᵢ, Xᵢ) that plays the role of the binary propensity score in balanced observational studies. Its key property is a balancing theorem analogous to Theorem 1 of Rosenbaum and Rubin [1983]:

Theorem (GPS Balancing). If fₜ|ₓ(t|Xᵢ) > 0 for all t ∈ T and all i, then fₜ|ₓ,ʀ(t | Xᵢ, r) = fₜ|ʀ(t | r) at r = fₜ|ₓ(t | Xᵢ).

That is, within strata defined by the GPS evaluated at t, the treatment indicator 1[Tᵢ = t] is independent of Xᵢ. In words: conditioning on the GPS removes the confounding effect of X on the relationship between T and Y, just as conditioning on the binary propensity score removes confounding in the binary case.

4 Identifying Assumptions

4.1 Weak Unconfoundedness for Continuous Treatments

Hirano and Imbens [2004] propose the following identifying assumption:

Assumption WU (Weak Unconfoundedness). For each t ∈ T, Yᵢ(t) ⊥ 1[Tᵢ = t] | Xᵢ.

This is weaker than "strong unconfoundedness" ({Yᵢ(t)}ₜ∈ᴛ ⊥ Tᵢ | Xᵢ) which requires the entire vector of potential outcomes to be jointly independent of the treatment conditional on covariates. Weak unconfoundedness requires only that, for each value t, the potential outcome Yᵢ(t) is independent of whether Tᵢ = t, given Xᵢ.

In practice, weak unconfoundedness is often as hard to defend as strong unconfoundedness—both require that all confounders are observed in Xᵢ. The advantage of the weaker assumption is formal: it is what the GPS identification result actually requires, so theorems under weak unconfoundedness are logically tighter.

4.2 Overlap

Assumption OV (Overlap). For all t ∈ T and all x in the support of X: fₜ|ₓ(t|x) > 0.

Overlap requires that every unit has positive density of receiving any dose in the support of the treatment. In practice, this can be violated at the tails of the treatment distribution, where very high or very low doses are received by only a narrow subset of units with specific covariate values. Violations appear as GPS values near zero (for units unlikely to receive their observed dose), and are analogous to lack of common support in binary matching.

5 Identification of the ADRF

Under weak unconfoundedness and overlap, the ADRF is identified:

μ(t) = E[E[Y | Ti = t, Ri = fT|X(t | Xi)]] (6)

where the outer expectation is over the distribution of Xᵢ. The ADRF at dose t can be recovered by:

  1. Estimating the conditional mean E[Yᵢ | Tᵢ, Rᵢ] as a function of the treatment and the GPS.
  2. Evaluating this function at Tᵢ = t for each unit.
  3. Averaging over units.

6 Estimation Strategies

6.1 GPS-Based Regression

Hirano and Imbens [2004] propose a two-step parametric estimator:

  • Step 1: Estimate the GPS. Fit a model for fₜ|ₓ(t|x). A common choice is a normal linear model Tᵢ | Xᵢ ~ N(Xᵢᵀγ, σ²), estimated by OLS. The GPS is then:
i =
1
σ̂√(2π)
exp(-
(Ti - XiTγ̂)2
2σ̂2
) (7)
  • Step 2: Estimate the conditional expectation. Regress Yᵢ on a flexible function of (Tᵢ, R̂ᵢ)—for example a quadratic:
E[Yi | Ti, R̂i] ≈ α0 + α1Ti + α2Ti2 + α3i + α4i2 + α5Tii (8)
  • Step 3: Average predictions. For each t on a fine grid, compute μ̂(t) = n⁻¹Σᵢ Ê[Yᵢ | Tᵢ = t, R̂ᵢ].

6.2 Inverse Probability Weighting

An alternative to regression adjustment is GPS-based inverse probability weighting (IPW). The IPW estimator of μ(t) is:

μ̂IPW(t) =
i wi(t) Yi
i wi(t)
, wi(t) =
fT(t)
fT|X(t | Xi)
(9)

where fₜ(t) is the marginal density of T evaluated at t. This weight reweights the sample so that units near dose t receive weight inversely proportional to their GPS, creating a pseudo-population in which T and X are independent. In practice fₜ(t) and fₜ|ₓ(t|Xᵢ) are estimated, and the ratio is a nonparametric density estimate that can be numerically unstable at the tails.

6.3 Doubly Robust Estimation

Kennedy [2017] develop a doubly robust (DR) estimator for the ADRF that is consistent if either the GPS model or the outcome model is correctly specified. The DR estimator combines the regression adjustment and IPW components:

μ̂DR(t) = En [m̂(t, Xi) + wi(t) (Yi - m̂(t, Xi))] (10)

where m̂(t,x) = E[Yᵢ | Tᵢ = t, Xᵢ = x] is a nonparametric outcome model. When both models are estimated at n¹ᐟ⁴-rates (e.g., using kernel methods or series estimators), the DR estimator achieves √n consistency for each point μ(t).

7 Practical Considerations

Figure 1: Estimated average dose-response function (ADRF)

ADRF μ̂(t) CI widens (sparse data) Dose t μ̂(t)

Figure 1: Estimated average dose-response function (ADRF) with pointwise 95% confidence bands. Uncertainty grows at the tails of the dose distribution where overlap is thin.

  • Overlap diagnostics. Before estimating the ADRF, plot the GPS (R̂ᵢ) by dose level. Units with very small GPS values (near zero) have low likelihood of receiving their observed dose given their covariates, signalling overlap violations. Trim units with R̂ᵢ < ε for some threshold ε.
  • Balancing tests. After GPS adjustment, check covariate balance: within strata defined by (R̂ᵢ, Tᵢ), the distribution of Xᵢ should be approximately uniform across dose values. Weighted standardised mean differences provide a formal check.
  • Model for the GPS. The normal linear model is convenient but may be inappropriate if the treatment is bounded (e.g., percentage variables) or skewed. Beta regression (for proportions), log-normal models (for non-negative, skewed doses), or nonparametric density estimation are alternatives.
  • Bandwidth/smoothness. When estimating μ̂(t) on a grid, results depend on the smoothness assumptions embedded in the regression or density estimation. Report sensitivity to bandwidth or polynomial degree.

8 An Application: Air Pollution and Mortality

A canonical application of GPS methods is estimating the dose-response relationship between fine particulate matter (PM₂.₅) concentration and mortality rates. Counties vary continuously in pollution levels; confounders include income, urbanisation, and industrial composition. A GPS estimated from these covariates allows nonparametric estimation of how mortality rates change across the pollution distribution, revealing whether the dose-response is linear (as regulatory models often assume) or concave/convex.

Wu et al. [2020] apply GPS methods to Medicare mortality data and find evidence of a concave dose-response: the mortality effect of a unit increase in PM₂.₅ is larger at low pollution levels than at high levels. This finding has important policy implications—it suggests that further reductions in already-clean areas may be more valuable than linear models imply.

9 Relationship to the DiD Framework

The GPS framework and the Callaway et al. [2024] DiD framework for continuous treatments are conceptually distinct but complementary.

The GPS approach:• Relies on cross-sectional unconfoundedness (conditional on X, treatment is as-good-as-random) • Is appropriate when panel data are unavailable or when the treatment varies cross-sectionally • Requires correct specification of the GPS model

The DiD approach for continuous treatments relies instead on parallel trends across dose values over time, is appropriate for panel settings with staggered treatment adoption, and is more robust to time-invariant unobserved heterogeneity. In settings with panel data, the DiD approach is generally preferred; in cross-sectional settings, the GPS is the main alternative.

10 Software

The GPS can be estimated and applied using several R packages:

  1. CBPS (Covariate Balancing Propensity Score): fits GPS models by directly optimising covariate balance.
  2. CausalGPS: implements GPS regression, IPW, and DR estimators with entropy balancing weights.
  3. npcausal: implements the Kennedy [2017] doubly robust ADRF estimator with uniform confidence bands.

11 Conclusion

The generalised propensity score provides a principled framework for estimating causal dose-response functions from observational data. By extending the balancing theorem to continuous treatments, Hirano and Imbens [2004] show that the ADRF is identified under weak unconfoundedness and overlap assumptions that, while strong, are no stronger than their binary treatment analogues. Doubly robust extensions improve robustness to model mis-specification. Researchers working with continuously varying treatments in cross-sectional settings should consider the GPS as a complement to or, in the absence of panel data, a substitute for DiD-based dose-response estimation.

References

  1. Callaway, B., Goodman-Bacon, A., and Sant'Anna, P. H. C. (2024). Difference-in-differences with a continuous treatment. arXiv:2107.0263707.
  2. Hirano, K. and Imbens, G. W. (2004). The propensity score with continuous treatments. In Gelman, A. and Meng, X.-L. (eds.), Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives, pp. 73-84. Wiley.
  3. Imai, K. and van Dyk, D. A. (2004). Causal inference with general treatment regimes: Generalising the propensity score. Journal of the American Statistical Association, 99(467):854-866.
  4. Kennedy, E. H. (2017). Nonparametric causal effects based on incremental propensity score interventions. Journal of the American Statistical Association, 114(526):645-656.
  5. Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1):41-55.
  6. Wu, X., Mealli, F., Kioumourtzoglou, M.-A., Dominici, F., and Braun, D. (2020). Evaluating the association of long-term average air pollution with mortality using the generalised propensity score. Journal of the American Statistical Association, 117(540):1528-1539.
  7. Imbens, G. W. (2000). The role of the propensity score in estimating dose-response functions. Biometrika, 87(3):706-710.

Continue Reading

Browse All Sections →
Home
This is some text inside of a div block.
This is some text inside of a div block.
This is some text inside of a div block.

Article Title