The Causal Review

What Problem Does `rdrobust`

Solve?

Regression discontinuity (RD) designs identify causal effects by comparing outcomes just above and just below a threshold in a running variable. The central implementation challenge is bandwidth selection: how wide a window around the cutoff should we use? Too wide, and we introduce bias (the true relationship may be non-linear over the wider range); too narrow, and variance explodes.

Early practice in applied economics was ad hoc: researchers chose bandwidths that "looked right" in plots, sometimes reporting only results that were significant. The rdrobust package implements the theoretically grounded, data-driven bandwidth selector of Calonico et al.(2014) (hereafter CCT) along with bias-corrected, robust confidence intervals. It has become the standard toolkit for RD analysis in R and Stata.

Installation and Setup

install.packages("rdrobust") library(rdrobust) # Also useful for plots and density tests: install.packages("rddensity") library(rddensity)

The package includes a built-in dataset, rdrobust RDsenate, containing US Senate election data used in Calonico et al.(2015). We will use this and a simulated dataset to illustrate the key functions.

A Minimal Working Example

Simulated Data

We simulate a sharp RD with a known effect of 2 units:

set.seed(42) n <- 1000 X <- runif(n, -1, 1) # Running variable (centered at cutoff 0) D <- as.integer(X >= 0) # Sharp treatment indicator Y <- 1 + 2 * D + X + rnorm(n, sd = 0.5) # True effect = 2 # Naive mean comparison (biased if X is not balanced): mean(Y[D == 1]) - mean(Y[D == 0]) # Gives approx 2.5 due to the positive slope of X

Sharp RD Estimation with `rdrobust`

# Main RD estimate with CCT bandwidth, local linear regression rdr <- rdrobust(y = Y, x = X, c = 0) summary(rdr)

The output includes:

Conventional: point estimate and confidence interval using the MSE-optimal bandwidth h but ignoring bias.
Bias-corrected: point estimate after subtracting estimated bias (uses a slightly larger bandwidth b for bias estimation).
Robust: bias-corrected estimate with an inflated standard error that accounts for uncertainty in bias estimation. This is the recommended inferential object.

# Access key output: rdr$coef # Conventional, bias-corrected, and robust estimates rdr$bws # Selected bandwidths (h for estimation, b for bias) rdr$ci # Confidence intervals rdr$pv # p-values

Interpreting the Output

A typical summary(rdr) output looks like:

============================================== Conventional Bias-corrected Robust ———————————————- Estimate 2.03 2.01 2.01 Std. Error 0.12 0.12 0.15 z-statistic 16.9 16.8 13.4 P-value 0.000 0.000 0.000 95% CI [1.80, 2.26] [1.78, 2.24] [1.72, 2.30] ———————————————- Bandwidth h: 0.47 b: 0.73

Note that the robust confidence interval is wider than the conventional one — this is by design. The CCT argument is that naive confidence intervals based on h have incorrect coverage because bias is of the same order as the standard error. The robust CI restores correct coverage.

Key Options and Their Meaning

Kernel

By default,
uses a triangular kernel, which down-weights observations farther from the cutoff:

Triangular is preferred as it gives the optimal boundary convergence rate for local polynomial estimators.

Polynomial order

The default is local linear (p = 1). Higher-order polynomials reduce bias at the cost of variance:

For the bias correction, the order is automatically set to q = p + 1.

Bandwidth selection method

The default uses MSE-optimal bandwidth. Alternatives:

# Different bandwidths on each side rdr_asym <- rdrobust(y = Y, x = X, c = 0, bwselect = "msetwo")

# Coverage error rate (CER) optimal — better for inference rdr_cer <- rdrobust(y = Y, x = X, c = 0, bwselect = "cerrd")

Covariates

Adding covariates can improve precision without affecting consistency:

Fuzzy RD

For a fuzzy design where treatment is endogenous but jumps at the threshold:

This implements the fuzzy RD estimator: ratio of the jump in Y to the jump in D.

Visualisation with `rdplot`

rdplot(y = Y, x = X, c = 0, title = "RD Plot: Simulated Sharp Design", x.label = "Running Variable", y.label = "Outcome", nbins = c(20, 20)) # 20 bins on each side

rdplot() uses the IMSE-optimal number of bins and overlays the local polynomial fit. It is an excellent first step for any RD analysis.

Density Manipulation Test with `rddensity`

Before trusting RD results, check that individuals cannot sort around the cutoff:

library(rddensity) rdd <- rddensity(X = X, c = 0) summary(rdd) rdplotdensity(rdd, X)

The null hypothesis is that the density of the running variable is continuous at c. A significant p-value suggests manipulation. For the simulated data, we expect no manipulation.

Bandwidth Sensitivity: A Best Practice

Always show robustness to bandwidth choice:

h_grid <- seq(0.1, 1.0, by = 0.1) ests <- sapply(h_grid, function(h) rdrobust(y = Y, x = X, c = 0, h = h)$coef[3] # robust estimate )

plot(h_grid, ests, type = "b", pch = 16, xlab = "Bandwidth h", ylab = "RD Estimate", main = "Sensitivity to Bandwidth") abline(h = 2, col = "red", lty = 2) # True effect

Stable estimates across a range of bandwidths strengthen the credibility of the RD design.

Comparison to Alternatives

For most applied work, rdrobust is the right default. For settings where one wants honestworst-case confidence intervals rather than MSE-optimal ones, RDHonest implements theapproach of Armstrong and Kolesar(2020).

Pitfalls

Multiple testing at many cutoffs. If you test RD at many candidate cutoffs, use p-value corrections.
Reporting only significant bandwidths. Always present a sensitivity plot.
Ignoring heaping. Some running variables take only integer values; standard density tests may flag this as manipulation. Cattaneo et al.(2020) discuss solutions.
Using high-order global polynomials. Use local polynomials via rdrobust, not global polynomial regressions with lm().

Conclusion

rdrobust provides the state-of-the-art implementation of regression discontinuity estimation, combining data-driven bandwidth selection, local polynomial regression, and bias-corrected robust inference in a user-friendly package. The combination of rdrobust, rdplot, and rddensity covers the full workflow of a credible RD analysis: visualisation, density testing, estimation, and bandwidth sensitivity.

References

Armstrong, T. B. and Koles'ar, M. (2020). Simple and honest confidence intervals in nonparametric regression. Quantitative Economics, 11(1):1--39.
Calonico, S., Cattaneo, M. D., and Titiunik, R. (2014). Robust nonparametric confidence intervals for regression-discontinuity designs. Econometrica, 82(6):2295--2326.
Calonico, S., Cattaneo, M. D., and Titiunik, R. (2015). Optimal data-driven regression discontinuity plots. Journal of the American Statistical Association, 110(512):1753--1769.
Calonico, S., Cattaneo, M. D., Farrell, M. H., and Titiunik, R. (2017). rdrobust: Software for regression-discontinuity designs. Stata Journal, 17(2):372--404.
Cattaneo, M. D., Idrobo, N., and Titiunik, R. (2020). A Practical Introduction to Regression Discontinuity Designs: Foundations. Cambridge University Press, Cambridge.

rdrobust in R: Optimal Bandwidth and Robust Inference for RDD

What Problem Does `rdrobust`

Installation and Setup

A Minimal Working Example

Simulated Data

Sharp RD Estimation with `rdrobust`

Interpreting the Output

Key Options and Their Meaning

Kernel

Polynomial order

Bandwidth selection method

Covariates

Fuzzy RD

Visualisation with `rdplot`

Density Manipulation Test with `rddensity`

Bandwidth Sensitivity: A Best Practice

Comparison to Alternatives

Pitfalls

Conclusion

References

Continue Reading

The causalml Package in Python: Uplift Modeling and CATE Meta-Learners

The gsynth Package in R: Generalized Synthetic Control with Interactive Fixed Effects

Recent Results: Immigration, Migration, and Labour Markets

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Article Title

rdrobust in R: Optimal Bandwidth and Robust Inference for RDD

What Problem Does rdrobust

Installation and Setup

A Minimal Working Example

Simulated Data

Sharp RD Estimation with rdrobust

Interpreting the Output

Key Options and Their Meaning

Kernel

Polynomial order

Bandwidth selection method

Covariates

Fuzzy RD

Visualisation with rdplot

Density Manipulation Test with rddensity

Bandwidth Sensitivity: A Best Practice

Comparison to Alternatives

Pitfalls

Conclusion

References

Continue Reading

The causalml Package in Python: Uplift Modeling and CATE Meta-Learners

The gsynth Package in R: Generalized Synthetic Control with Interactive Fixed Effects

Recent Results: Immigration, Migration, and Labour Markets

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Stay current with causal inference

Article Title

What Problem Does `rdrobust`

Sharp RD Estimation with `rdrobust`

Visualisation with `rdplot`

Density Manipulation Test with `rddensity`