1 Motivation
The regression discontinuity design (RDD) is one of the most credible quasi-experimental identification strategies in economics (Imbens and Lemieux, 2008). Under the continuity assumption that the conditional expectation of potential outcomes is continuous in the running variable at the cutoff—the RDD estimand is:
where c is the cutoff and X the running variable. Estimation of (1) requires nonparametric approximation of two one-sided regression functions. The dominant frequentist approach—local polynomial regression with a mean-squared-error optimal bandwidth (Calonico et al., 2014)—relies on asymptotic distributional approximations that can be unreliable in small samples or when the regression functions are highly nonlinear.
A Bayesian approach replaces asymptotic arguments with exact finite-sample posterior distributions. By placing a prior on the regression functions and conditioning on the data, the posterior distribution of τRD is available in closed form when a Gaussian process (GP) prior is used. This article describes the GP-RDD framework, its advantages, and its practical implementation.
2 Frequentist RDD: A Brief Recap
Standard RDD estimation proceeds by fitting local polynomial regressions on each side of the cutoff. With a bandwidth h and polynomial of degree p, the estimator is:
where μ̂+(c) and μ̂-(c) are the predicted values of the local polynomial regressions evaluated at the cutoff from the right and left, respectively. The Calonico et al. (2014) cite_start procedure selects h to minimise the asymptotic MSE of τ̂LP, and constructs bias-corrected robust confidence intervals. While theoretically well-founded, the approach has two practical limitations:
- Small samples: asymptotic normal approximations may be poor when few observations fall within the bandwidth.
- Hyperparameter sensitivity: results can be sensitive to the order of the local polynomial p and kernel choice, with no natural Bayesian analogue of prior sensitivity analysis.
3 The Gaussian Process Prior
A Gaussian process is a distribution over functions: any finite collection of function values follows a multivariate normal distribution. Formally, f ∼ 𝒢𝒫(μ, k) means that for any finite set of points x1, ..., xn, the vector (f(x1), ..., f(xn))′ is multivariate normal with mean (μ(x1), ..., μ(xn))′ and covariance matrix Kij = k(xi, xj).
The covariance kernel k(x, x′) controls the smoothness of sample paths from the GP. A popular choice is the Matérn-5/2 kernel:
where σ2 > 0 is the marginal variance and l > 0 is the length-scale. A larger l implies smoother functions; the length-scale plays an analogous role to the bandwidth in local polynomial regression, but is regularised by the prior.
4 GP-RDD: The Model
Following Branson et al. (2019), the Bayesian RDD model places separate GP priors on the regression functions on each side of the cutoff.
with observations generated by Yᵢ = f(Xᵢ) + εᵢ, εᵢ ∼ 𝒩(0, σε2). The two GP priors are independent, encoding the identifying assumption that no continuity restriction is imposed across the cutoff.
Posterior distribution. Let X+ denote the observed running variable values to the right of the cutoff with outcomes Y+. The posterior of f+(c)—the right limit at the cutoff—is:
where
and K++ is the kernel matrix evaluated at X+. An analogous posterior follows for f−(c).
Since f+(c) and f−(c) are a posteriori independent (by the independence of the two GPs),the posterior of the RDD estimand is:
Equation (9) is exact; no large-sample approximation is needed.
5 Hyperparameter Treatment
The GP prior has hyperparameters θ = (σ2, l, σε2). Three approaches are available:
- Fixed: set ℓ via cross-validation on control observations (analogous to bandwidthselection). The posterior (9) then conditions on fixed hyperparameters.
- Empirical Bayes: maximise the marginal likelihood P(Y | θ) over θ. This integratesout f analytically and yields a point estimate of θ.
- Full Bayes: place hyperpriors on θ and sample from the full joint posterior usingMCMC (e.g., HMC via Stan). This propagates hyperparameter uncertainty into theposterior of τRD, naturally widening credible intervals when smoothness is uncertain.place hyperpriors on θ and sample from the full joint posterior using MCMC (e.g., HMC via Stan). This propagates hyperparameter uncertainty into the posterior of τRD.
6 Advantages and Limitations
6.1 Advantages
- Exact finite-sample inference. The posterior (9) is exact regardless of sample size. When there are few observations near the cutoff, frequentist confidence intervals rely on poor asymptotic approximations; Bayesian credible intervals remain valid by construction.
- Automatic regularisation. The length-scale l controls the effective complexity of the regression function. Rather than selecting a bandwidth with a separate algorithm, the GP prior regularises the function automatically and consistently.
- Principled uncertainty propagation. Under full Bayesian treatment, uncertainty about l, σ2, and σε2 propagates into the posterior of τRD. This gives conservative credible intervals when the appropriate smoothness is unknown.
6.2 Limitations
- Kernel misspecification. TheGPpriorembodiesspecificsmoothness assumptions throughthe kernel. If the true regression function is discontinuous away from the cutoff (e.g., haskinks), the posterior may be poorly calibrated. Sensitivity to kernel choice should be reported.
- Computational cost. Exact GP inference costs O(n3) due to matrix inversion. For nmuch greater than a few thousand, sparse GP approximations (inducing points, Nystr¨omapproximation) are needed.
- Interpretation. A 95% Bayesian credible interval is not a 95% frequentist confidenceinterval. Coverage guarantees differ; applied researchers should be careful about whichinterpretation is relevant for their application.
7 Application Example
Consider a scholarship programme that awards grants to students scoring above 65 on a test. The running variable has a bounded support of 0-100. With only 90 observations within 5 points of the cutoff, the CCT local polynomial estimator has a large effective standard error because the asymptotic approximation is unreliable at this sample size.
A GP-RDD with Matérn-5/2 kernel and empirical Bayes hyperparameter estimation produces: τ̂RD = 0.21 (95% credible interval: [0.07, 0.34]).
The posterior credible interval is narrower than the asymptotic confidence interval from the local linear estimator ([0.03, 0.39]) because the GP prior borrows smoothness information from observations farther from the cutoff in a regularised way.
8 Software
Bayesian RDD can be implemented using:
- R: the
rdbayesandrdbayesianpackages implement GP-RDD;rstanandbrmscan be used for custom specifications.
- Python: the
GPyandGPflowlibraries provide general GP regression that can be adapted for RDD.
- Stan: full Bayesian models with hyperpriors can be written directly in Stan for maximum flexibility.
9 Conclusion
Bayesian regression discontinuity via Gaussian process priors offers a principled alternative to frequentist local polynomial estimation, particularly in small samples. The posterior of the RDD treatment effect is available in closed form, no separate bandwidth selection step is required, and hyperparameter uncertainty propagates naturally into inference. As GP software matures, GP-RDD will become an important tool in the causal inference toolkit.
References
- Branson, Z., Rischard, M., Bornn, L., and Miratrix, L. W. (2019). A nonparametric Bayesian methodology for regression discontinuity designs. Journal of Statistical Planning and Inference, 202:14-30.
- Calonico, S., Cattaneo, M. D., and Titiunik, R. (2014). Robust nonparametric confidence intervals for regression-discontinuity designs. Econometrica, 82(6):2295-2326.
- Hahn, J., Todd, P., and van der Klaauw, W. (2001). Identification and estimation of treatment effects with a regression-discontinuity design. Review of Economic Studies, 68(1):201-209.
- Imbens, G. W. and Lemieux, T. (2008). Regression discontinuity designs: A guide to practice. Journal of Econometrics, 142(2):615-635. \
- Lee, D. S. (2008). Randomized experiments from non-random selection in U.S. House elections. Journal of Econometrics, 142(2):675-697.
- Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press.