The Causal Review

Motivation: Identification at a Threshold

Many policies operate through sharp cutoffs. Students who score above a threshold gain admission to selective programmes; firms above a size threshold face stricter environmental regulations; individuals whose income falls below a poverty line receive benefits; candidates who win a plurality of votes take office. Near these thresholds, whether an individual falls above or below is close to random — determined by a few points on a test, a fraction of a vote, or a small income fluctuation. Regression discontinuity (RD) designs exploit this near-randomisation to identify causal effects without a formal experiment.

First described by Thistlethwaite and Campbell(1960) in the context of scholarship awards, the RD design was formalised by Hahn et al.(2001) and popularised in applied economics by Imbens and Lemieux(2008) and Lee and Lemieux(2010). The landmark contribution of Calonico et al.(2014) provided the theoretical foundation and practical tools for bias-corrected, robust bandwidth selection that are now standard in applied work.

The Sharp RD Design

Setup and Identification

Let \(X_i\) be a continuous running variable (also called the forcing variable or score) and \(c\) be a known cutoff. In a sharp RD, treatment is a deterministic function of the running variable: \[\begin{equation} D_i = \mathbf{1}(X_i \geq c). \label{eq:sharprd} \end{equation}\] Treatment switches from 0 to 1 at exactly the cutoff. The identifying assumption is continuity of the conditional expectation function (CEF) of potential outcomes at the cutoff: \[\begin{equation} \lim_{x \downarrow c} \mathbb{E}[Y_i(0) \mid X_i = x] = \lim_{x \uparrow c} \mathbb{E}[Y_i(0) \mid X_i = x]. \label{eq:continuity} \end{equation}\] Under this condition, the jump in the observed outcome at \(c\) identifies the average treatment effect at the cutoff: \[\begin{equation} \tau_{\text{SRD}} = \lim_{x \downarrow c} \mathbb{E}[Y_i \mid X_i = x] - \lim_{x \uparrow c} \mathbb{E}[Y_i \mid X_i = x]. \label{eq:srdestimand} \end{equation}\] The continuity assumption means that in the absence of treatment, the outcome CEF would have been smooth through \(c\) — any jump is therefore caused by treatment.

Local Polynomial Estimation

Estimating \(\tau_{\text{SRD}}\) requires estimating two limits: the CEF just above and just below the cutoff. The standard approach is local polynomial regression of order \(p\) in a bandwidth \(h\) around the cutoff. For the right limit, fit \[\begin{equation} \min_{\beta} \sum_{i: X_i \geq c} \left(Y_i - \sum_{j=0}^{p} \beta_j (X_i - c)^j\right)^2 K\!\left(\frac{X_i - c}{h}\right), \label{eq:localpoly} \end{equation}\] and symmetrically for the left limit, where \(K(\cdot)\) is a kernel function (typically triangular). The intercepts \(\hat{\beta}_0^+\) and \(\hat{\beta}_0^-\) estimate the right and left limits, and \(\hat{\tau} = \hat{\beta}_0^+ - \hat{\beta}_0^-\).

Local linear (\(p=1\)) estimation is the most common choice. It avoids the boundary bias problems of kernel regression while remaining interpretable. Higher-order polynomials reduce bias further at the cost of increased variance.

The Fuzzy RD Design

In many settings, treatment does not switch sharply at the cutoff. A student above the admission threshold may still choose not to attend the selective programme; a firm just below the regulatory threshold may comply voluntarily. In these cases, the probability of treatment jumps at \(c\) but does not go from 0 to 1.

This is the fuzzy RD design. Rather than a deterministic treatment rule, treatment probability jumps at \(c\): \[\begin{equation} \lim_{x \downarrow c} \Pr(D_i = 1 \mid X_i = x) \neq \lim_{x \uparrow c} \Pr(D_i = 1 \mid X_i = x). \label{eq:fuzzy} \end{equation}\] The fuzzy RD estimand is obtained by scaling the jump in the outcome by the jump in treatment probability — exactly the ratio structure of an IV estimator: \[\begin{equation} \tau_{\text{FRD}} = \frac{\lim_{x \downarrow c} \mathbb{E}[Y_i \mid X_i = x] - \lim_{x \uparrow c} \mathbb{E}[Y_i \mid X_i = x]}{\lim_{x \downarrow c} \Pr(D_i = 1 \mid X_i = x) - \lim_{x \uparrow c} \Pr(D_i = 1 \mid X_i = x)}. \label{eq:frdestimand} \end{equation}\] Indeed, the fuzzy RD is equivalent to an IV design in a local neighbourhood of the cutoff, using the indicator \(\mathbf{1}(X_i \geq c)\) as an instrument for \(D_i\). Under monotonicity, \(\tau_{\text{FRD}}\) identifies the LATE for compliers — individuals whose treatment status is determined by whether they fall above or below the threshold (Hahn et al.(2001)).

The Bandwidth Selection Problem

The central challenge in RD estimation is choosing the bandwidth \(h\). This involves a classical bias-variance trade-off:

A wider bandwidth uses more data, reducing variance but introducing bias if the CEF is not linear over that range.
A narrower bandwidth reduces bias but inflates variance, potentially yielding uninformative estimates.

Early practice relied on ad hoc choices (e.g., \(h = \) one standard deviation of the running variable) or visual inspection. Imbens and Lemieux(2008) proposed a cross-validation approach. The landmark contribution of Calonico et al.(2014) provided the first asymptotically optimal, data-driven bandwidth selector based on mean-squared error (MSE) minimisation.

The CCT Bandwidth

Calonico et al.(2014) derive the MSE-optimal bandwidth for a local polynomial estimator of order \(p\). For local linear estimation (\(p = 1\)) on a triangular kernel, the optimal bandwidth balances bias\(^2\) and variance: \[\begin{equation} h^* = C_{\text{MSE}} \cdot n^{-1/5}, \label{eq:optbw} \end{equation}\] where \(C_{\text{MSE}}\) is a constant that depends on the second derivative of the CEF at \(c\), the conditional variance of the outcome, and the density of \(X_i\) at \(c\). These quantities are estimated from the data using a pilot bandwidth, making the procedure fully data-driven.

A key insight is that standard confidence intervals based on the MSE-optimal bandwidth do not have correct coverage: the bias at \(h^*\) is of the same order as the standard error, so confidence intervals that ignore bias are systematically too narrow. Calonico et al.(2014) propose a bias-corrected robust (BCR) confidence interval that explicitly estimates and removes the bias term, then corrects the standard error for the uncertainty introduced by bias estimation.

The CCT approach is implemented in the rdrobust package for R and Stata (Calonico et al.(2017)), which we discuss in this week's Toolbox article.

Identification Checks

Unlike IV, where the exclusion restriction is untestable, RD designs admit several falsification tests:

Continuity of density (McCrary test). If individuals can manipulate the running variable to sort above the threshold, the density of \(X_i\) will show a spike just above \(c\). McCrary(2008) proposes a test for a discontinuity in the density at \(c\). Significant manipulation invalidates the continuity assumption.
Covariate balance. Pre-determined covariates should not jump at the cutoff. Running the RD specification on baseline characteristics provides a powerful specification check.
Placebo cutoffs. Estimating "effects" at other values of the running variable (away from the true cutoff) should yield estimates indistinguishable from zero.
Sensitivity to bandwidth. The point estimate should be robust to modest changes in bandwidth. Large sensitivity suggests that the CEF is not well-approximated by a linear function in the chosen bandwidth.

Classic Applications

Lee(2008) exploited the 50% vote-share threshold in US House elections: candidates who barely won were compared to those who barely lost. The RD estimate of incumbency advantage is large and robust.

Angrist and Lavy(1999) used Maimonides' rule in Israeli schools — which requires class splitting whenever enrolment crosses multiples of 40 — as a fuzzy discontinuity: enrolment crossing a threshold raises the probability of being in a small class but does not perfectly determine class size. The fuzzy RD estimates found that smaller classes significantly improved test scores.

Dell(2010) used the geographic boundary of the Peruvian mita (forced mining labour system, 1573–1812) as a sharp discontinuity. Districts that fell just inside the mita boundary are significantly poorer today, identifying a large, persistent causal effect of colonial extractive institutions.

Available Software

The rdrobust package (R and Stata) implements CCT-optimal bandwidth selection, local polynomial estimation, bias-corrected robust confidence intervals, and the density manipulation test. The rddensity package provides the McCrary-style density test with modern bias corrections (Cattaneo et al.(2020)).

Conclusion

Regression discontinuity designs provide some of the most transparent and credible causal estimates available in observational economics. Their validity rests on a continuous CEF — an assumption that, unlike IV exclusion restrictions, can be rigorously tested and that is plausible in many institutional settings. The CCT bandwidth and BCR confidence intervals provide a principled, data-driven implementation that is now standard practice. When a threshold exists and individuals cannot manipulate their score, the RD design should be a researcher's first choice.

References

Angrist, J. D. and Lavy, V. (1999). Using Maimonides' rule to estimate the effect of class size on scholastic achievement. Quarterly Journal of Economics, 114(2):533--575.
Calonico, S., Cattaneo, M. D., and Titiunik, R. (2014). Robust nonparametric confidence intervals for regression-discontinuity designs. Econometrica, 82(6):2295--2326.
Calonico, S., Cattaneo, M. D., Farrell, M. H., and Titiunik, R. (2017). rdrobust: Software for regression-discontinuity designs. Stata Journal, 17(2):372--404.
Cattaneo, M. D., Idrobo, N., and Titiunik, R. (2020). A Practical Introduction to Regression Discontinuity Designs: Foundations. Cambridge University Press, Cambridge.
Dell, M. (2010). The persistent effects of Peru's mining mita. Econometrica, 78(6):1863--1903.
Hahn, J., Todd, P., and Van der Klaauw, W. (2001). Identification and estimation of treatment effects with a regression-discontinuity design. Econometrica, 69(1):201--209.
Imbens, G. W. and Lemieux, T. (2008). Regression discontinuity designs: A guide to practice. Journal of Econometrics, 142(2):615--635.
Lee, D. S. (2008). Randomized experiments from non-random selection in U.S. House elections. Journal of Econometrics, 142(2):675--697.
Lee, D. S. and Lemieux, T. (2010). Regression discontinuity designs in economics. Journal of Economic Literature, 48(2):281--355.
McCrary, J. (2008). Manipulation of the running variable in the regression discontinuity design: A density test. Journal of Econometrics, 142(2):698--714.
Thistlethwaite, D. L. and Campbell, D. T. (1960). Regression-discontinuity analysis: An alternative to the ex post facto experiment. Journal of Educational Psychology, 51(6):309--317.

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

Motivation: Identification at a Threshold

The Sharp RD Design

Setup and Identification

Local Polynomial Estimation

The Fuzzy RD Design

The Bandwidth Selection Problem

The CCT Bandwidth

Identification Checks

Classic Applications

Available Software

Conclusion

References

Continue Reading

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Article Title

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

Motivation: Identification at a Threshold

The Sharp RD Design

Setup and Identification

Local Polynomial Estimation

The Fuzzy RD Design

The Bandwidth Selection Problem

The CCT Bandwidth

Identification Checks

Classic Applications

Available Software

Conclusion

References

Continue Reading

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Stay current with causal inference

Article Title