Feature Stories

Interference and Spillovers: When SUTVA Fails and What To Do About It

1 Introduction

The Stable Unit Treatment Value Assumption SUTVA is so foundational to the potential outcomes framework that it is easy to forget it is an assumption at all. Most introductory treatments state it in passing and move on. Yet for a growing share of the settings researchers actually care about social programmes with peer effects, platform experiments with network externalities, public health interventions with herd immunity, place-based policies with agglomeration effects SUTVA is violated by design. The units interact. What happens to your neighbour shapes what happens to you, regardless of your own treatment status.

This article examines what SUTVA violation means for identification, surveys the frameworks that have been developed to handle interference, and reviews the practical strategies researchers use when they cannot assume units are isolated from one another. The econometrics of interference is no longer a speciality corner; it is increasingly central to credible causal inference.

2 SUTVA and Why It Can Fail

Rubin [1980] formalised SUTVA as requiring two things: (1) no multiple versions of treatment (the treatment received by unit i is well-defined and does not vary with who administers it), and (2) no interference between units (the potential outcome of unit i depends only on i's own treatment status, not on the treatment status of any other unit j ≠ i). The second condition the no-interference clause is the focus of this article.

Under no interference, we write unit i's potential outcomes as Yi(0) and Yi(1), indexed solely by i's own treatment Di ∈ {0, 1}. The observed outcome is Yi = Yi(Di). This is clean and tractable.

When units interact, the potential outcomes must instead be indexed by the entire treatment vector D = (D1, ..., Dn). Unit i's outcome is Yi(d) for any assignment vector d ∈ {0, 1}^n. With n units, there are 2^n possible assignment vectors a combinatorial explosion that makes non-parametric identification essentially impossible at any useful sample size [Hudgens and Halloran, 2008].

The practical question is therefore not whether SUTVA holds in an absolute sense, but whether the interference is structured enough that we can still learn something useful. Several frameworks impose tractable structure.

3 The Partial Interference Framework

Hudgens and Halloran [2008] proposed the partial interference assumption: units are partitioned into clusters (households, villages, classrooms) such that interference occurs freely within clusters but not across clusters. Within cluster g, unit i's potential outcome is Yig(dg), depending on the full treatment vector of cluster g but not on the treatment of units in other clusters.

This is enormously useful for vaccine trials, deworming programmes, and conditional cash transfer evaluations, where a household is affected by what happens to its neighbours but not by what happens in a distant village. Under partial interference, Hudgens and Halloran [2008] define four estimands:

  1. Direct effect: the effect of treating unit i conditional on the cluster treatment allocation α (proportion treated in the cluster).
  2. Indirect (spillover) effect: the effect on unit i of changing the cluster allocation from α to α', holding i's own treatment fixed.
  3. Total effect: the combined direct and indirect effect.
  4. Overall effect: the population-average effect of the policy α relative to no treatment.

A two-level randomised experiment first randomise clusters into treatment arms, then randomise individuals within treated clusters separately identifies all four estimands.

A Worked Example: Deworming in Kenya

A canonical application is the Miguel and Kremer [2004] deworming study in Kenya, later reanalysed extensively for spillover effects. Deworming reduces worm burden in treated children but also, by reducing transmission, in untreated children in the same school. A naive intention-to-treat estimate that ignores spillovers underestimates the total social benefit of the programme. Hoyt and Bjorkman [2009] later used the partial interference framework to decompose total and indirect effects in a similar setting.

4 Exposure Mappings

Aronow and Samii [2017] generalised partial interference by introducing exposure mappings. Rather than requiring an arbitrary cluster partition, an exposure mapping fi(D) summarises the features of the treatment assignment that are relevant for unit i's outcome. For instance, in a social network experiment, fi(D) might be the fraction of i's friends who are treated:

$$f_i(\mathbf{D}) = \frac{1}{|N_i|} \sum_{j \in N_i} D_j \tag{1}$$

where Ni is the set of i's network neighbours. Once an exposure mapping is specified, the potential outcomes are written Yi(d, f) for own treatment d and exposure level f. Identification proceeds via inverse probability weighting under a known treatment assignment mechanism [Aronow and Samii, 2017].

The power of this approach is its flexibility: the researcher specifies, and hence makes transparent, exactly which channel of interference is being modelled. The limitation is equally clear: if the exposure mapping is misspecified, the estimand is not well defined.

5 Approximate Neighbourhood Interference

Leung [2022] proposed a different approach suited to large networks. Rather than specifying an exposure mapping, Leung [2022] assumes that interference decays with network distance: units far apart in the network have negligible effect on one another. Formally, unit i's potential outcome Yi(D) depends non-trivially on the treatment of units within K steps of i in the network, but the influence of units at distance > K is approximately zero. As K grows, more interference is accommodated; as the network becomes sparse, the approximation becomes sharper.

Under approximate neighbourhood interference, central limit theorems hold for appropriately constructed estimators, enabling inference even without a clean cluster structure [Leung, 2022]. This is particularly relevant for observational social network data where the analyst has no control over the assignment mechanism.

6 Experimental Designs Under Interference

When the researcher has control over randomisation, several designs are better suited to settings with interference than a simple Bernoulli trial.

Clustered Randomisation

The most common approach is to randomise treatment at a higher level of aggregation than the unit of analysis randomise villages, not households. If interference is confined within villages (partial interference), this eliminates cross-cluster contamination. The cost is a reduction in effective sample size and hence statistical power.

Bipartite Experiments

In platform settings (e.g., two-sided markets), interference runs between buyers and sellers rather than among buyers. Johari et al. [2022] formalised bipartite experiments: randomise on one side of the market and measure outcomes on the other, using a design that limits interference across the experimental boundary. Large technology companies, including LinkedIn, have reported using variants of this approach.

The Bernoulli Design and Horvitz-Thompson Estimation

When the full treatment vector D is randomised as independent Bernoullis with known probability p, the Horvitz-Thompson estimator:

$$\hat{\tau} = \frac{1}{n} \sum_{i=1}^{n} \frac{D_i Y_i}{p} - \frac{(1 - D_i) Y_i}{1 - p}$$
(2)

remains unbiased for the average treatment effect if no interference is present. Under exposure mapping interference, a modified Horvitz-Thompson estimator:

$$\hat{\tau}(d, f) = \frac{1}{n} \sum_{i=1}^{n} \frac{\mathbf{1}[D_i = d, f_i(\mathbf{D}) = f]}{\Pr(D_i = d, f_i(\mathbf{D}) = f)} Y_i$$
(3)

identifies the mean potential outcome at exposure (d, f) [Aronow and Samii, 2017]. Variance estimation is, however, challenging because outcomes of units that share network neighbours are correlated.

7 Difference-in-Differences with SpilloversInterference creates particular difficulties for DiD designs, which typically assume that the control group provides a valid counterfactual for the treated group. If a policy in region A generates spillovers to neighbouring region B, using B as a control group will produce a biased estimate of the direct policy effect. The spillover contaminates the control.

This is not a merely theoretical concern. Autor [2003] studied the employment effects of the Americans with Disabilities Act and noted that firms near the size threshold of legal applicability showed different pre-trends than firms far from the threshold. Goldsmith-Pinkham et al. [2020] discuss how spillovers across geographic units can invalidate shift-share instruments that treat geographic variation as independent.

A practical remedy is to exclude "border" control units those most likely to be affected by spillovers from treated units from the control group. Alternatively, researchers can construct a spillover treatment variable (e.g., fraction of neighbouring counties that received treatment) and include it explicitly in the regression, estimating a joint model of direct and spillover effects.

8 A Diagram of Interference StructuresFigure 1 illustrates three structures of interference: no interference (SUTVA), partial interference (cluster-contained), and general network interference.

(A) SUTVA holds T C C
(B) Partial interference T C C
(C) Network interference T C C

Figure 1: Three interference structures. Dashed arrows indicate spillover pathways. Panel (A): SUTVA no cross-unit influence. Panel (B): Partial interference within clusters only. Panel (C): Network interference arbitrary spillover patterns.

9 Current Debates and FrontiersWhat is the right estimand under interference?

When units interact, the "average treatment effect" ceases to be well defined without specifying the counterfactual treatment allocation. Manski [2013] showed that with social interactions, the relevant policy comparison is between full-population treatment policies, not individual-level treatment assignments. This raises deep questions about what applied researchers should report.

Equilibrium effects.

Large-scale policies that change prices or alter labour market equilibria produce spillovers that no finite sample of clusters can circumvent. The general equilibrium critique that evaluation in a small pilot cannot speak to the effects of full-scale rollout-applies whenever treatment operates through market-clearing mechanisms [Heckman, 1998]. Structural modelling is one response; Acemoglu et al. [2022] provide a recent treatment of how to identify equilibrium effects in production networks.

Estimating spillovers in observational data.

Most of the identification results above assume a known, researcher-controlled assignment mechanism. In observational settings, identifying spillover effects requires an instrument that generates variation in neighbourhood treatment independent of own treatment a demanding requirement that few natural experiments satisfy cleanly.

10 Implications for Practice

Applied researchers should take the following steps when spillovers are plausible:

  1. State the interference assumption explicitly. SUTVA, partial interference, and exposure mapping are all defensible in different settings; make clear which one you are invoking.
  2. Design around interference. When randomising, use clusters large enough that cross-cluster spillovers are negligible, and pre-specify the cluster level.
  3. Report both direct and indirect effects. A direct effect estimate alone misses social multipliers that may be the dominant channel of policy impact.
  4. Conduct spillover robustness checks. Estimate the effect on "near-control" units separately from "far-control" units; the difference is informative about contamination.

11 Conclusion

SUTVA is an assumption, not a law of nature. For a large class of economically interesting interventions public health, information diffusion, labour market programmes, platform experiments interference is the norm rather than the exception. The frameworks developed by Hudgens and Halloran [2008], Aronow and Samii [2017], and Leung [2022] show that credible causal inference is still possible when units interact, but it requires more careful experimental design, more transparent specification of the interference structure, and more thoughtful definition of the estimand. As datasets increasingly capture network structure and as policy interventions operate at scale, the econometrics of interference will only grow in importance.

References

  1. Acemoglu, D., Akigit, U., and Kerr, W. R. (2022). Networks and the macroeconomy: an empirical exploration. NBER Macroeconomics Annual, 30:276-335.
  2. Aronow, P. M. and Samii, C. (2017). Estimating average causal effects under general interference, with application to a social network experiment. Annals of Applied Statistics, 11(4):1912-1947.
  3. Autor, D. H. (2003). Outsourcing at will: the contribution of unjust dismissal doctrine to the growth of employment outsourcing. Journal of Labor Economics, 21(1):1-42.
  4. Goldsmith-Pinkham, P., Sorkin, I., and Swift, H. (2020). Bartik instruments: what, when, why, and how. American Economic Review, 110(8):2586-2624.
  5. Heckman, J. J. (1998). Detecting discrimination. Journal of Economic Perspectives, 12(2):101-116.
  6. Bjorkman, M. and Svensson, J. (2009). Power to the people: evidence from a randomized field experiment on community-based monitoring in Uganda. Quarterly Journal of Economics, 124(2):735-769.
  7. Hudgens, M. G. and Halloran, M. E. (2008). Toward causal inference with interference. Journal of the American Statistical Association, 103(482):832-842.
  8. Johari, R., Li, H., Liskovich, I., and Weintraub, G. Y. (2022). Experimental design in two-sided platforms: an analysis of bias. Management Science, 68(10):7069-7089.
  9. Leung, M. P. (2022). Causal inference under approximate neighborhood interference. Econometrica, 90(1):267-293.
  10. Manski, C. F. (2013). Identification of treatment response with social interactions. Econometrics Journal, 16(1):S1-S23.
  11. Miguel, E. and Kremer, M. (2004). Worms: identifying impacts on education and health in the presence of treatment externalities. Econometrica, 72(1):159-217.
  12. Rubin, D. B. (1980). Comment on "Randomization analysis of experimental data: the Fisher randomization test" by D. Basu. Journal of the American Statistical Association, 75(371):591-593.[cite: 8]

Continue Reading

Browse All Sections →
Home
This is some text inside of a div block.
This is some text inside of a div block.
This is some text inside of a div block.

Article Title