What Problem Does rdrobust
Solve?
Regression discontinuity (RD) designs identify causal effects by comparing outcomes just above and just below a threshold in a running variable. The central implementation challenge is bandwidth selection: how wide a window around the cutoff should we use? Too wide, and we introduce bias (the true relationship may be non-linear over the wider range); too narrow, and variance explodes.
Early practice in applied economics was ad hoc: researchers chose bandwidths that "looked right" in plots, sometimes reporting only results that were significant. The rdrobust package implements the theoretically grounded, data-driven bandwidth selector of Calonico et al.(2014) (hereafter CCT) along with bias-corrected, robust confidence intervals. It has become the standard toolkit for RD analysis in R and Stata.
Installation and Setup
install.packages("rdrobust")
library(rdrobust)
# Also useful for plots and density tests: install.packages("rddensity") library(rddensity)
The package includes a built-in dataset, rdrobust\_RDsenate, containing US Senate election data used in Calonico et al.(2015). We will use this and a simulated dataset to illustrate the key functions.
A Minimal Working Example
Simulated Data
We simulate a sharp RD with a known effect of 2 units:
set.seed(42)
n <- 1000
X <- runif(n, -1, 1) # Running variable (centered at cutoff 0)
D <- as.integer(X >= 0) # Sharp treatment indicator
Y <- 1 + 2 * D + X + rnorm(n, sd = 0.5) # True effect = 2
# Naive mean comparison (biased if X is not balanced): mean(Y[D == 1]) - mean(Y[D == 0]) # Gives approx 2.5 due to the positive slope of X
Sharp RD Estimation with rdrobust
# Main RD estimate with CCT bandwidth, local linear regression
rdr <- rdrobust(y = Y, x = X, c = 0)
summary(rdr)
The output includes:
- Conventional: point estimate and confidence interval using the MSE-optimal bandwidth \(h\) but ignoring bias.
- Bias-corrected: point estimate after subtracting estimated bias (uses a slightly larger bandwidth \(b\) for bias estimation).
- Robust: bias-corrected estimate with an inflated standard error that accounts for uncertainty in bias estimation. This is the recommended inferential object.
# Access key output:
rdr$coef # Conventional, bias-corrected, and robust estimates
rdr$bws # Selected bandwidths (h for estimation, b for bias)
rdr$ci # Confidence intervals
rdr$pv # p-values
Interpreting the Output
A typical summary(rdr) output looks like:
==============================================
Conventional Bias-corrected Robust
———————————————-
Estimate 2.03 2.01 2.01
Std. Error 0.12 0.12 0.15
z-statistic 16.9 16.8 13.4
P-value 0.000 0.000 0.000
95% CI [1.80, 2.26] [1.78, 2.24] [1.72, 2.30]
———————————————-
Bandwidth h: 0.47 b: 0.73
Note that the robust confidence interval is wider than the conventional one — this is by design. The CCT argument is that naive confidence intervals based on \(h\) have incorrect coverage because bias is of the same order as the standard error. The robust CI restores correct coverage.
Key Options and Their Meaning
Kernel
By default,rdrobust uses a triangular kernel, which down-weights observations farther from the cutoff:
rdr_epan <- rdrobust(y = Y, x = X, c = 0, kernel = "epanechnikov")
rdr_unif <- rdrobust(y = Y, x = X, c = 0, kernel = "uniform")
Triangular is preferred as it gives the optimal boundary convergence rate for local polynomial estimators.
Polynomial order
The default is local linear (\(p = 1\)). Higher-order polynomials reduce bias at the cost of variance:
rdr_p2 <- rdrobust(y = Y, x = X, c = 0, p = 2) # Local quadratic
For the bias correction, the order is automatically set to \(q = p + 1\).
Bandwidth selection method
The default uses MSE-optimal bandwidth. Alternatives:
# Common bandwidth on both sides
rdr_sym <- rdrobust(y = Y, x = X, c = 0, bwselect = "mserd")
# Different bandwidths on each side rdr_asym <- rdrobust(y = Y, x = X, c = 0, bwselect = "msetwo")
# Coverage error rate (CER) optimal — better for inference rdr_cer <- rdrobust(y = Y, x = X, c = 0, bwselect = "cerrd")
Covariates
Adding covariates can improve precision without affecting consistency:
covs <- cbind(rnorm(n), rbinom(n, 1, 0.5)) # some baseline controls
rdr_cov <- rdrobust(y = Y, x = X, c = 0, covs = covs)
Fuzzy RD
For a fuzzy design where treatment is endogenous but jumps at the threshold:
D_fuzzy <- D * rbinom(n, 1, 0.8) # 20% non-compliance
rdr_fuzzy <- rdrobust(y = Y, x = X, c = 0, fuzzy = D_fuzzy)
This implements the fuzzy RD estimator: ratio of the jump in \(Y\) to the jump in \(D\).
Visualisation with rdplot
rdplot(y = Y, x = X, c = 0,
title = "RD Plot: Simulated Sharp Design",
x.label = "Running Variable",
y.label = "Outcome",
nbins = c(20, 20)) # 20 bins on each side
rdplot() uses the IMSE-optimal number of bins and overlays the local polynomial fit. It is an excellent first step for any RD analysis.
Density Manipulation Test with rddensity
Before trusting RD results, check that individuals cannot sort around the cutoff:
library(rddensity) rdd <- rddensity(X = X, c = 0) summary(rdd) rdplotdensity(rdd, X)
The null hypothesis is that the density of the running variable is continuous at \(c\). A significant \(p\)-value suggests manipulation. For the simulated data, we expect no manipulation.
Bandwidth Sensitivity: A Best Practice
Always show robustness to bandwidth choice:
h_grid <- seq(0.1, 1.0, by = 0.1) ests <- sapply(h_grid, function(h) rdrobust(y = Y, x = X, c = 0, h = h)$coef[3] # robust estimate )
plot(h_grid, ests, type = "b", pch = 16, xlab = "Bandwidth h", ylab = "RD Estimate", main = "Sensitivity to Bandwidth") abline(h = 2, col = "red", lty = 2) # True effect
Stable estimates across a range of bandwidths strengthen the credibility of the RD design.
Comparison to Alternatives
| Package | Language | Key feature | |
|---|---|---|---|
rdrobust | R / Stata | CCT bandwidth, bias-corrected CI, fuzzy RD | |
rddensity | R / Stata | Modern manipulation test | |
rdd | R | Earlier implementation; less maintained | |
RDHonest | R | Worst-case honest CIs (Armstrong \ | Kolesar 2020) |
rdlocrand | R / Stata | Local randomisation perspective |
For most applied work, rdrobust is the right default. For settings where one wants honest worst-case confidence intervals rather than MSE-optimal ones, RDHonest implements the approach of Armstrong and Kolesar(2020).
Pitfalls
- Multiple testing at many cutoffs. If you test RD at many candidate cutoffs, use p-value corrections.
- Reporting only significant bandwidths. Always present a sensitivity plot.
- Ignoring heaping. Some running variables take only integer values; standard density tests may flag this as manipulation. Cattaneo et al.(2020) discuss solutions.
- Using high-order global polynomials. Use local polynomials via
rdrobust, not global polynomial regressions withlm().
Conclusion
rdrobust provides the state-of-the-art implementation of regression discontinuity estimation, combining data-driven bandwidth selection, local polynomial regression, and bias-corrected robust inference in a user-friendly package. The combination of rdrobust, rdplot, and rddensity covers the full workflow of a credible RD analysis: visualisation, density testing, estimation, and bandwidth sensitivity.
References
- Armstrong, T. B. and Koles\'ar, M. (2020). Simple and honest confidence intervals in nonparametric regression. Quantitative Economics, 11(1):1--39.
- Calonico, S., Cattaneo, M. D., and Titiunik, R. (2014). Robust nonparametric confidence intervals for regression-discontinuity designs. Econometrica, 82(6):2295--2326.
- Calonico, S., Cattaneo, M. D., and Titiunik, R. (2015). Optimal data-driven regression discontinuity plots. Journal of the American Statistical Association, 110(512):1753--1769.
- Calonico, S., Cattaneo, M. D., Farrell, M. H., and Titiunik, R. (2017). rdrobust: Software for regression-discontinuity designs. Stata Journal, 17(2):372--404.
- Cattaneo, M. D., Idrobo, N., and Titiunik, R. (2020). A Practical Introduction to Regression Discontinuity Designs: Foundations. Cambridge University Press, Cambridge.