Feature Stories

Causal Inference with Administrative Data: Opportunities and Pitfalls

1 Introduction

For most of the twentieth century, empirical economics relied on surveys: the Current Population Survey, the Panel Study of Income Dynamics, the General Social Survey. Carefully drawn probability samples, painstakingly collected, were the gold standard for measuring economic outcomes. Today, a quiet revolution is underway. Administrative data records created by governments and institutions in the course of administering programmes now underpin some of the most influential causal studies in the social sciences.

The scale is transformative. Chetty et al. [2014] merged Internal Revenue Service tax returns for virtually the entire U.S. population across decades to estimate intergenerational income mobility by commuting zone, college, and birth cohort. Autor et al. [2013] drew on Longitudinal Employer-Household Dynamics (LEHD) data linking workers to plants to study the China trade shock. Medical researchers routinely link hospital discharge records, prescription databases, and mortality registries to evaluate treatments in populations orders of magnitude larger than any randomised trial could enrol.

But administrative data brings its own methodological hazards measurement error, selective availability, survivorship bias, and privacy constraints that differ fundamentally from classical survey error. This article surveys the opportunities administrative data offers for causal inference, the pitfalls unique to this data environment, and the practices that help researchers navigate both.

2 What Administrative Data Offers

2.1 Scale and Completeness

The most obvious advantage of administrative data is size. Population-scale datasets eliminate the sampling variance that limits the precision of survey estimates. For researchers studying heterogeneous treatment effects- a cornerstone of modern causal inference large datasets are not merely convenient; they are essential. Chetty et al. [2018] use over 27 million college-earnings records to estimate the earnings premium of attending an elite university for students from each income quintile, a level of granularity that no survey could support.

Completeness matters for identification, not just precision. Many natural experiments exploit sharp rules eligibility cutoffs, programme phase-in schedules, policy discontinuities- that generate identifying variation only for small slices of the distribution. Administrative data on the full population allows researchers to detect effects precisely at these thresholds. The regression discontinuity designs pioneered by Imbens and Lemieux [2008] become far more powerful when the running variable is measured for everyone rather than a sample.

2.2 Long Panel Structures

Administrative records are generated continuously as long as individuals interact with the state or an institution. This creates panel structures with far more time periods than surveys can typically achieve. Dobbie et al. [2018] follow individuals in bankruptcy records for up to a decade; Bleakley [2007] construct cohort-level panels spanning the entire twentieth century using census microdata linked to public health records. Long panels allow researchers to distinguish immediate treatment effects from long-run trajectories, study anticipation effects, and test for pre-treatment parallel trends over extended windows.

2.3 Linkage Across Datasets

Perhaps the most powerful feature of administrative data is the ability to link records across sources. A person appearing in unemployment insurance records, income tax filings, health insurance claims, and criminal justice records is the same person yet the causal connections between these domains are exactly what policy-makers care about. Record linkage using name, date of birth, address, and national identification numbers (where available) creates multivariate panels that no single survey instrument can replicate.

The probabilistic record linkage methodology of Fellegi and Sunter [1969] remains foundational. Modern implementations use machine learning to train classifiers that determine whether two records refer to the same individual, achieving match rates above 95% in high-quality administrative systems.

3 Measurement Error in Administrative Records

Survey researchers have long studied classical measurement error. Administrative data introduces a different and in some ways more treacherous form: error that is non-classical, systematic, and correlated with the outcomes of interest.

3.1 Administrative Definitions Differ from Economic Concepts

Tax records measure reported income, not true earnings. Hours worked are rarely recorded. Employer-provided benefits health insurance, pensions appear in some administrative systems but not others. Occupational classification codes change over time and across jurisdictions. Programme participation records measure programme enrolment, not actual service receipt.

These definitional gaps create measurement error in the treatment variable, which can attenuate IV estimates even when the instrument itself is valid. If the endogenous variable X is measured with error η such that X* = X + η, the 2SLS estimator using instrument Z identifies:

^β2SLS =
Cov(Z, Y)
Cov(Z, X*)
= β ·
Cov(Z, X)
Cov(Z, X*)
(1)

which equals β only if Cov(Z, η) = 0. Measurement error in the treatment can propagate through the instrument if the error is correlated with the excluded variable a form of violation that surveys, with their direct questioning, might detect but administrative systems cannot.

3.2 Selective Availability and Survivorship Bias

Administrative records are available only for individuals who interact with the institution that generates them. IRS tax records exist only for tax filers; hospital discharge records exist only for patients who sought hospital care; unemployment insurance records exist only for claimants in the formal labour market. This selectivity can destroy causal identification when the selection is correlated with treatment.

A particularly pernicious form is survivorship bias: the records of individuals who drop out of a programme, emigrate, or die are missing from the dataset. Doyle [2007] studies the effect of foster care on long-run outcomes using a quasi-random assignment of children to caseworkers as an instrument a design that requires careful attention to whether dropout from the foster care system is random with respect to the instrument.

3.3 Coding Changes and Data Quality Discontinuities

Administrative systems are periodically revised: tax codes change, eligibility criteria are modified, reporting requirements evolve. These changes can introduce artificial discontinuities in the time series that mimic the treatment variation researchers seek to exploit. A researcher using a before-after design must verify that observed changes in outcomes are not artefacts of administrative coding revisions.

4 Privacy, Access, and the Secure Data Enclave Model

High-quality administrative data is almost always confidential. The institutional response has been the secure data enclave: a computing environment, typically on government or university servers, in which approved researchers can analyse data that never leaves the secure perimeter. The U.S. Census Bureau's Federal Statistical Research Data Centers (FSRDCs), Statistics Canada's Research Data Centres, and national statistical agencies throughout Europe operate similar systems.

The enclave model creates frictions that shape research design. Researchers cannot freely iterate: each output table must be approved for release, a process that can take days or weeks. Exploratory data analysis is constrained. Replication by external researchers requires independent data access applications, limiting reproducibility. Despite these costs, the enclave model has proved productive the FSRDC network has produced thousands of publications using Census microdata.

The European GDPR framework and analogous data protection legislation in other jurisdictions has simultaneously made administrative data more abundant (as institutions must document the data they hold) and more restricted (as data flows for research require explicit legal basis). Research teams increasingly work through data use agreements that specify permissible analyses in advance, a constraint that parallels the pre-analysis plan movement in experimental economics.

5 Identification Strategies That Exploit Administrative Data

5.1 Regression Discontinuity at Administrative Cutoffs

Policy rules are often written in terms of thresholds income cutoffs for benefit eligibility, age cutoffs for programme access, score cutoffs for selective schools. These thresholds are natural regression discontinuity designs. Administrative data enables the researcher to place the running variable at high resolution around the cutoff, increasing the efficiency of the local polynomial estimate and allowing precise manipulation tests.

5.2 Difference-in-Differences with Policy Roll-outs

Government programmes rarely roll out uniformly. Staggered adoption states implementing Medicaid expansion in different years, plants subject to environmental regulation at different times creates the variation underlying most modern DiD designs. Administrative data at the universe level ensures that variation in rollout timing is not confounded by survey sampling variation, and makes pre-trend testing over many pre-treatment periods feasible.

5.3 Matched Records as Panel Instruments

Record linkage creates matched employer-employee panels in which researchers can exploit firm-level shocks as instruments for worker-level outcomes. Card et al. [2013] use linked employer-employee data to decompose the variance of earnings into worker fixed effects, firm fixed effects, and sorting. IV strategies that use firm-level shocks (product demand shifts, credit supply contractions) as instruments for worker earnings require exactly the kind of longitudinal matched structure that administrative data provides.

6 Best Practices

Several practices have emerged that help researchers extract valid causal inference from administrative data:

  1. Validate administrative measures against surveys. Where possible, cross-validate the key variables against survey-reported measures. Systematic discrepancies signal measurement error that may threaten identification.
  2. Test for coding discontinuities. Before exploiting a policy threshold or date as a discontinuity, plot the underlying administrative variable over time to check for artefactual breaks.
  3. Document sample restrictions explicitly. Every restriction imposed on the raw administrative data age ranges, employment status, geography may induce selectivity. Report the size and observable characteristics of the excluded population.
  4. Account for record linkage error in inference. Probabilistic matches introduce false positives and false negatives. Sensitivity analysis that bounds treatment effects under plausible match error rates is increasingly expected in top journals.
  5. Pre-specify analyses where possible. Given the temptation to capitalise on the richness of administrative data, pre-analysis plans even informal ones submitted to the data enclave before analysis begins strengthen credibility.

7 Conclusion

Administrative data has expanded the frontier of causal inference in economics and the social sciences. Population-scale records, linked across institutions and time, support identification strategies that were simply impossible with conventional surveys. The pathbreaking work of Chetty et al. [2014] on intergenerational mobility, Autor et al. [2013] on the China trade shock, and dozens of other landmark studies rests on this infrastructure.

Yet the opportunities come with obligations. Administrative data is not a passive substitute for well-designed surveys. It brings its own measurement problems, selection mechanisms, and institutional constraints that must be understood and carefully addressed. Researchers who treat administrative data as a convenient shortcut ignoring selective availability, measurement error in key variables, or coding changes over time risk publishing causal claims that are artefacts of administrative processes rather than genuine policy effects.

The methodological frontier is moving fast. As record linkage algorithms improve, as secure computing infrastructure expands, and as data protection law evolves, administrative data will become an even more central part of the causal inference toolkit.

References

  1. Abramitzky, R., Boustan, L., Eriksson, K., Feigenbaum, J., and Pérez, S. (2021). Automated linking of historical data. Journal of Economic Literature, 59(3):865-918.
  2. Autor, D. H., Dorn, D., and Hanson, G. H. (2013). The China syndrome: Local labor market effects of import competition in the United States. American Economic Review, 103(6):2121-2168.
  3. Bleakley, H. (2007). Disease and development: Evidence from hookworm eradication in the American South. Quarterly Journal of Economics, 122(1):73-117.
  4. Calonico, S., Cattaneo, M. D., and Titiunik, R. (2014). Robust nonparametric confidence intervals for regression-discontinuity designs. Econometrica, 82(6):2295-2326.
  5. Callaway, B. and Sant'Anna, P. Н. С. (2021). Difference-in-differences with multiple time periods. Journal of Econometrics, 225(2):200-230.
  6. Card, D., Heining, J., and Kline, P. (2013). Workplace heterogeneity and the rise of West German wage inequality. Quarterly Journal of Economics, 128(3):967-1015.
  7. Casey, K., Glennerster, R., and Miguel, E. (2012). Reshaping institutions: Evidence on aid impacts using a preanalysis plan. Quarterly Journal of Economics, 127(4):1755-1812.
  8. Chetty, R., Hendren, N., Kline, P., and Saez, E. (2014). Where is the land of opportunity? The geography of intergenerational mobility in the United States. Quarterly Journal of Economics, 129(4):1553-1623.
  9. Chetty, R., Friedman, J. N., Saez, E., Turner, N., and Yagan, D. (2018). Mobility report cards: The role of colleges in intergenerational mobility. NBER Working Paper No. 23618.
  10. Dobbie, W., Goldsmith-Pinkham, P., Mahoney, N., and Song, J. (2018). Bad credit, no problem? Credit and labor market consequences of bad credit reports. Journal of Finance, 75(5):2377-2419.
  11. Doyle, J. J. (2007). Child protection and child outcomes: Measuring the effects of foster care. American Economic Review, 97(5):1583-1610.
  12. Fellegi, I. P. and Sunter, A. B. (1969). A theory for record linkage. Journal of the American Statistical Association, 64(328):1183-1210.
  13. Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. Journal of Econometrics, 225(2):254-277.
  14. Imbens, G. W. and Lemieux, T. (2008). Regression discontinuity designs: A guide to practice. Journal of Econometrics, 142(2):615-635.
  15. McCrary, J. (2008). Manipulation of the running variable in the regression discontinuity design: A density test. Journal of Econometrics, 142(2):698-714.

Continue Reading

Browse All Sections →
Home
This is some text inside of a div block.
This is some text inside of a div block.
This is some text inside of a div block.

Article Title