Toolbox

rdrobust in R: Optimal Bandwidth and Robust Inference for Regression Discontinuity Designs

1 What Problem Does rdrobust Solve?

Regression discontinuity (RD) designs identify causal effects by comparing outcomes just above and just below a threshold in a running variable. The central implementation challenge is bandwidth selection: how wide a window around the cutoff should we use? Too wide, and we introduce bias (the true relationship may be non-linear over the wider range); too narrow, and variance explodes. Early practice in applied economics was ad hoc: researchers chose bandwidths that "looked right" in plots, sometimes reporting only results that were significant. The rdrobust package implements the theoretically grounded, data-driven bandwidth selector of Calonico et al.  [2014] (hereafter CCT) along with bias-corrected, robust confidence intervals. It has become the standard toolkit for RD analysis in R and Stata.

2 Installation and Setup

install.packages("rdrobust")library (rdrobust)

Also useful for plots and density tests:

install.packages("rddensity") library (rddensity)

The package includes a built-in dataset, rdrobust_RDsenate, containing US Senate elec- tion data used in Calonico et al. [2015]. We will use this and a simulated dataset to illustrate the key functions.

3 A Minimal Working Example

3.1 Simulated Data

We simulate a sharp RD with a known effect of 2 units: set.seed(42) $n<-1000$ X <- runif (n, 1, 1) Das.integer $(X>=0)$

Running variable (centered at cutoff 0)

Sharp treatment indicator

Y1+2D+X+ rnorm(n, $sd=0.5)$ # True effect $=2$

Naive mean comparison (biased if X is not balanced):

mean $(Y[D==1])-me~an(Y[D==0])$

Gives approx 2.5 due to the positive slope of X

3.2 Sharp RD Estimation with rdrobust

#Main RD estimate with CCT bandwidth, local linear regression

rdr <- rdrobust $(y=Y$ , $x=x$, $c=0$

summary (rdr)

The output includes:

  • Conventional: point estimate and confidence interval using the MSE-optimal band- width h but ignoring bias.
  • Bias-corrected: point estimate after subtracting estimated bias (uses a slightly larger bandwidth b for bias estimation).
  • Robust: bias-corrected estimate with an inflated standard error that accounts for uncertainty in bias estimation. This is the recommended inferential object.

Access key output:

rdr$coef            #Conventional, bias-corrected, and robust estimates

rdr$bws            #Selected bandwidths (h for estimation, b for bias)
rdr$ci                #Confidence intervals

rdr$pv               #p-values

3.3 Interpreting the Output

A typical summary(rdr) output looks like:
====================================================
            Conventional  Bias-corrected  Robust
----------------------------------------------------
Estimate      2.03            2.01        2.01
Std. Error    0.12            0.12        0.15
z-statistic  16.9            16.8        13.4
P-value       0.000           0.000       0.000
95% CI       [1.80, 2.26]    [1.78, 2.24] [1.72, 2.30]
----------------------------------------------------
Bandwidth h:  0.47    b:  0.73

Note that the robust confidence interval is wider than the conventional one this is by design. The CCT argument is that naive confidence intervals based on h have incorrect coverage because bias is of the same order as the standard error. The robust CI restores correct coverage.

4 Key Options and Their Meaning

4.1 Kernel

By default, rdrobust uses a triangular kernel, which down-weights observations farther from the cutoff: rdr_epan <- rdrobust , $X=X$,$(y=Y$rdr_unif <- rdrobust $(y=Y$, $x=x$, $c=0$ , kernel$c=0$, kernel = = "epanechnikov")"uniform")

Triangular is preferred as it gives the optimal boundary convergence rate for local polynomial estimators.

4.2 Polynomial order

The default is local linear $(p=1)$. Higher-order polynomials reduce bias at the cost of variance: rdr_p2 <- rdrobust $Cy=Y$, $x=x$, $c=0$, $P=2)$ # Local quadratic For the bias correction, the order is automatically set to $q=p+1$.

4.3 Bandwidth selection method

The default uses MSE-optimal bandwidth. Alternatives:

Common bandwidth on both sides

rdr_sym <- rdrobust $(y=Y$,$x=x$, $c=0$ , bwselect = "mserd")

Different bandwidths on each side

rdr_asym <- rdrobust $(y=Y$,$x=x$,

Coverage error rate (CER) optimal

$c=0$ , bwselect = "msetwo") better for inference rdr_cer <- rdrobust $(y=y$, $x=x$, $c=0$, bwselect = "cerrd")

4.4 Covariates

Adding covariates can improve precision without affecting consistency: covs <- cbind(rnorm(n), rbinom (n, 1, 0.5)) # some baseline controlsrdr_cov <- rdrobust $(y=Y$, $x=x$, $c=0$, covs = covs)

4.5 Fuzzy RD

For a fuzzy design where treatment is endogenous but jumps at the threshold:D_fuzzy <- Drbinom (n, 1, 0.8) #20% non-compliancerdr_fuzzy <- rdrobust $Xy=Y$, $x=x$,= , fuzzy $c=0$D_fuzzy)This implements the fuzzy RD estimator: ratio of the jump in Y to the jump in D.

5 Visualisation with rdplot

rdplot(y = Y, $x=x$, $c=0$, title = "RD Plot: Simulated Sharp Design", x.label = "Running Variable", y.label = "Outcome", nbins c (20, 20)) #20 bins on each side

rdplot() uses the IMSE-optimal number of bins and overlays the local polynomial fit. It is an excellent first step for any RD analysis.

6 Density Manipulation Test with rddensity

Before trusting RD results, check that individuals cannot sort around the cutoff: library (rddensity) rdd <- rddensity $(X=X$, $c=0$ summary (rdd) rdplotdensity (rdd, X)

The null hypothesis is that the density of the running variable is continuous at c.  A sig- nificant p-value suggests manipulation. For the simulated data, we expect no manipulation.

7 Bandwidth Sensitivity: A Best Practice

Always show robustness to bandwidth choice: h_grid seq (0.1, 1.0, by $=0.1)$ ests <- sapply(h_grid, function (h) { rdrobust $y=Y,$ $x=x$, $c=0$, $h=h)$$coef [3] # robust estimate }) plot(h_grid, ests, type = "b", pch $=16$ , xlab= "Bandwidth h", ylab = "RD Estimate", main = "Sensitivity to Bandwidth") abline $Ch=2$, col = "red", lty = 2)

True effect

Stable estimates across a range of bandwidths strengthen the credibility of the RD design.

Table 1: RD packages: a brief comparison
Package Language Key feature
rdrobust R / Stata CCT bandwidth, bias-corrected CI, fuzzy RD
rddensity R / Stata Modern manipulation test
rdd R Earlier implementation; less maintained
RDHonest R Worst-case honest CIs (Armstrong & Kolesar 2020)
rdlocrand R / Stata Local randomisation perspective

8 Comparison to Alternatives

For most applied work, rdrobust is the right default. For settings where one wants honest worst-case confidence intervals rather than MSE-optimal ones, RDHonest implements the approach of Armstrong and Kolesar [2020].

9 Pitfalls

  1. Multiple testing at many cutoffs. If you test RD at many candidate cutoffs, use p-value corrections.
  2. Reporting only significant bandwidths. Always present a sensitivity plot.
  3. Ignoring heaping. Some running variables take only integer values; standard density tests may flag this as manipulation.  Cattaneo et al. [2020] discuss solutions.
  4. Using high-order global polynomials. Use local polynomials via rdrobust, not global polynomial regressions with 1m().

10 Conclusion

rdrobust provides the state-of-the-art implementation of regression discontinuity estimation, combining data-driven bandwidth selection, local polynomial regression, and bias-corrected robust inference in a user-friendly package. The combination of rdrobust, rdplot, and rddensity covers the full workflow of a credible RD analysis: visualisation, density testing, estimation, and bandwidth sensitivity.

References

  1. Armstrong, T. B. and Kolesár, M. (2020). Simple and honest confidence intervals in non- parametric regression. Quantitative Economics, 11(1):1-39.
  2. Calonico, S., Cattaneo, M. D., and Titiunik, R. (2014). Robust nonparametric confidence intervals for regression-discontinuity designs. Econometrica, 82(6):2295-2326.
  3. Calonico, S., Cattaneo, M. D., and Titiunik, R. (2015). Optimal data-driven regression discontinuity plots. Journal of the American Statistical Association, 110(512):1753-1769.
  4. Calonico, S., Cattaneo, M. D., Farrell, M. H., and Titiunik, R. (2017). rdrobust: Software for regression-discontinuity designs. Stata Journal, 17(2):372-404.
  5. Cattaneo, M. D., Idrobo, N., and Titiunik, R. (2020). A Practical Introduction to Regression Discontinuity Designs: Foundations. Cambridge University Press, Cambridge.

Continue Reading

Browse All Sections →
Home
This is some text inside of a div block.
This is some text inside of a div block.
This is some text inside of a div block.

Article Title