The Causal Review

1 The Causal Question

Do transportation networks raise real incomes, and if so, by how much? This question is central to development economics and public finance: governments worldwide spend hundreds of billions of dollars annually on roads, railways, and ports on the premise that infrastructure investment generates broad economic gains. Yet identifying the causal effect is hard. Governments build infrastructure in places expected to grow creating classic reverse causality. Donaldson [2018] exploits a remarkable historical setting the expansion of the Indian railway network under British colonial rule to answer this question with rigorous causal identification.

2 The Setting

‍Between 1853 and 1930, the British colonial administration built a railway network spanning roughly 67,000 kilometres across India, making it one of the largest colonial infrastructure projects in history. The expansion was primarily motivated by strategic military considerations and British commercial interests rather than by the economic potential of specific Indian regions. This provides the key identifying variation: the timing and routing of railway lines were not driven by local economic conditions in the way a profit-maximising private investor would have prioritised.

Donaldson [2018] assembles a rich dataset spanning Indian districts from 1870 to 1930, covering agricultural output, prices, rainfall (as a measure of productivity shocks), and railway access. The unit of observation is the district-year, and the primary outcome of interest is the real income of agricultural producers in each district.

3 Identification Strategy

The identification strategy combines two elements:

Within-district variation in railway access. The primary specification uses district-level panel data with district and year fixed effects. Variation comes from the timing of railway access: when a district receives a railway connection changes over the sample period. The coefficient of interest is on an indicator for whether district d had railway access in year t:

$$\ln Y_{dt} = \beta \cdot \text{Rail}_{dt} + \mu_d + \lambda_t + \varepsilon_{dt}, \tag{1}$$

where μ_d are district fixed effects and λ_t are year effects. The key identifying assumption is that the timing of railway arrival is uncorrelated with district-specific productivity shocks after conditioning on district and time effects.

Instrumental variable: hypothetical rail network. To address residual concerns that railway routing was correlated with anticipated growth, Donaldson [2018] constructs an instrumental variable based on a hypothetical rail network: the least-cost path network connecting only major cities, using topographic cost data on terrain and river crossings. Districts that fall along these hypothetical routes received actual railway access for geographic rather than economic reasons. This instrument exploits the "accidental" placement of railways relative to the theoretically optimal commercial routing.

4 The Trade Cost Channel

Donaldson embeds the empirical analysis in a structural model of interregional trade to recover the mechanism. The model is a multi-region Ricardian trade model where agricultural producers in district d face iceberg trade costs τ_dd' to ship to district d'. A key implication of the model is that real income in district d can be written as a function of market access, defined as the trade-cost-weighted sum of productivity across trading partners:

$$\ln Y_d = \text{const} + \frac{1}{\theta} \ln \left( \sum_{d'} \tau_{dd'}^{-\theta} T_{d'} \right), \tag{2}$$

where θ is the trade elasticity and T_d' is productivity in district d'. Railways reduce trade costs τ_dd' dramatically: Donaldson [2018] estimates that railway access reduces trade costs by approximately 80% compared to bullock-cart transport, using observed price data to recover trade cost estimates.

Using Price Data to Measure Trade Costs

A distinctive feature of Donaldson's empirical strategy is the use of agricultural commodity prices across districts to measure trade costs and the integration of markets. The standard result from spatial arbitrage is that price differences across connected markets cannot exceed trade costs:

$$|p_{dt} - p_{d't}| \le \tau_{dd'}. \tag{3}$$

When railways reduce trade costs, price dispersion falls and real incomes rise through access to better terms of trade. Donaldson [2018] uses panel variation in price gaps across district pairs to directly measure the fall in trade costs attributable to railway connections.

5 Key Findings

Effect on real incomes. The main estimate is that railway access increases real agricultural income by approximately 16%. This is a large effect comparable in magnitude to annual economic growth in a rapidly developing economy. The effect is robust across specifications, including the IV estimates using the hypothetical network instrument.

Market integration. Railway access substantially reduces price dispersion across districts: the coefficient of variation of prices for a given commodity falls significantly in connected districts. This is consistent with the trade cost channel and with standard arbitrage: railways integrate markets by allowing trade to arbitrage away price differences.

Placebo: planned but unbuilt routes. An important robustness check exploits routes that were planned but never built due to financial crises or administrative changes. Districts along planned but unbuilt routes show no differential income gains, ruling out the hypothesis that railways simply connected economically promising areas. This placebo is analogous to the McCrary density test in RDD: it tests whether the instrument is truly driven by geographic routing rather than economic fundamentals.

Counterfactual: India without railways. Using the structural model, Donaldson [2018] simulates the counterfactual welfare loss of removing the Indian railway network. The estimated welfare loss is approximately 16% of aggregate agricultural income a substantial portion of colonial India's limited growth. This structural counterfactual, grounded in the identified trade elasticity, illustrates how reduced-form causal estimates can inform welfare analysis when embedded in a model.

6 Limitations

External validity. Colonial India in 1870-1930 differs markedly from modern developing economies. The type of infrastructure (railways vs. roads), institutional context, and economic structure affect how infrastructure generates income gains. Replication studies in other contexts Faber [2014] for Chinese expressways and Morten and Oliveira [2018] for Indian roads suggest the original results are broadly consistent with modern evidence, though with heterogeneous effect sizes.

Distributional effects. The main results focus on average district-level income. Donaldson [2018] does not provide disaggregated analysis by social group, caste, or land ownership status. Infrastructure may have raised average income while concentrating gains among landowners or politically connected traders.

Selection into markets. The model assumes districts that receive railways participate in inter-district trade. Districts with very low agricultural productivity may remain subsistence economies even with railway access, muting the effect. The analysis conditions on observed crop-growing districts, which may not fully resolve this selection.

Measurement of income. Measuring real income in colonial India required constructing price indices from commodity price data for salt, wheat, rice, and other staples. Measurement error in historical price data could attenuate the estimates or introduce correlated errors across districts.

7 What We Learn

Donaldson [2018] is a landmark in the empirics of infrastructure and development for several reasons:

Credible identification in a historical setting. The combination of panel fixed effects and an instrument based on hypothetical routing addresses both reverse causality and omitted variable bias in a setting where RCTs are impossible.‍
Structural estimates from reduced-form variation. By embedding the IV estimate within a Ricardian trade model, Donaldson recovers a trade elasticity and performs welfare counterfactuals that go beyond the local average treatment effect.‍
The importance of trade costs. The evidence that infrastructure works primarily through trade cost reduction rather than through agglomeration, investment, or productivity spillovers has guided subsequent infrastructure evaluation.

References

Donaldson, D. (2018). Railroads of the Raj: estimating the impact of transportation infrastructure. American Economic Review, 108(4-5):899-934.
Faber, B. (2014). Trade integration, market size, and industrialization: evidence from China's National Trunk Highway System. Review of Economic Studies, 81(3):1046-1070.[cite: 5]
Morten, M. and Oliveira, J. (2018). The effects of roads on trade and migration: evidence from a planned capital city. NBER Working Paper No. 22158.[cite: 5]

‍

Railroads and Real Income: Donaldson (2018) on Infrastructure and Development in Colonial India

1 The Causal Question

2 The Setting

3 Identification Strategy

4 The Trade Cost Channel

Using Price Data to Measure Trade Costs

5 Key Findings

6 Limitations

7 What We Learn

References

Continue Reading

The ivmte Package in R: Marginal Treatment Effects and Bounding Policy-Relevant Parameters

The contdid Package in R: Estimating Dose-Response Functions with Continuous Treatments

Recent Results: Housing Markets, Rent Control, and Urban Economics

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Article Title

Railroads and Real Income: Donaldson (2018) on Infrastructure and Development in Colonial India

1 The Causal Question

2 The Setting

3 Identification Strategy

4 The Trade Cost Channel

Using Price Data to Measure Trade Costs

5 Key Findings

6 Limitations

7 What We Learn

References

Continue Reading

The ivmte Package in R: Marginal Treatment Effects and Bounding Policy-Relevant Parameters

The contdid Package in R: Estimating Dose-Response Functions with Continuous Treatments

Recent Results: Housing Markets, Rent Control, and Urban Economics

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Stay current with causal inference

Article Title