Introduction
Does studying alongside high-achieving classmates improve your own grades? Do adolescents take up smoking because their friends smoke, or do smokers merely select friends who share their habits? Do neighbours influence each other's investment decisions, political views, or consumption choices? Questions about how individuals affect one another— peer effects, social learning, neighbourhood effects— sit at the heart of social science. They also sit at the heart of one of the deepest identification challenges in econometrics.
In a landmark 1993 paper, Manski [1993] showed that observational data from a homogeneous social group are generically uninformative about the structural parameters of social interactions. This result, known as the reflection problem, explains why naive regressions of individual outcomes on group averages do not recover causal peer effects. It also explains why three decades of subsequent research have worked so hard to find design-based solutions.
This article surveys the reflection problem, its formal structure, the conditions under which peer effects can be identified, and the most important empirical strategies researchers have used to make progress.
1 The Linear-in-Means Model
The workhorse model of social interactions is the linear-in-means specification. Suppose individual i belongs to group g. Let yᵢ₉ denote the outcome of individual i, xᵢ₉ a vector of individual characteristics, and ȳ₉⁻ⁱ the average outcome of i's peers (excluding i themselves). The structural equation is:
where β captures endogenous effects (how peers' behaviour affects one's own behaviour), δ captures contextual effects (how peers' characteristics affect one's behaviour), and εᵢ₉ is an idiosyncratic error.
Manski [1993] distinguishes three types of social influences:
- ndogenous effects: individual behaviour varies with the behaviour of the reference group (β ≠ 0).
- Exogenous (contextual) effects: individual behaviour varies with the exogenous characteristics of the reference group (δ ≠ 0).
- Correlated effects: individuals in the same group behave similarly because they share similar environments or characteristics— not because they interact at all.
The reflection problem arises when the group is in equilibrium: each individual's outcome depends on the group average, but the group average is just the average of the individuals' outcomes, creating a simultaneity. Taking the expectation of (1) across individuals in group g and solving for the group average yields:
Substituting back into (1) shows that yᵢ₉ is a linear function of xᵢ₉ and x̄₉ alone. The endogenous peer effect β and the contextual effect δ are observationally equivalent to a single reduced-form coefficient on x̄₉, making them impossible to separately identify from cross-sectional data on a single group.
Intuitively: if the group is in equilibrium, observing that good students have good peers merely reflects selection into groups, not causal effects.
2 Why the Problem Is Deep
The non-identification result does not require strong functional form assumptions. Even if we observe the outcome, individual characteristics, and group membership for a large population of groups, we cannot recover β and δ separately unless we have additional restrictions. The problem is structural, not statistical: more data does not help.
Several additional complications compound the problem. First, selection into groups: students with high ability may choose to attend the same schools, meaning the peer group variable is endogenous. Second, common shocks: students in the same school share teachers, resources, and local labour market conditions— correlated effects that mimic peer effects. Third, simultaneous causation: if everyone affects everyone, the direction of causation is ambiguous.
3 Identification Strategies
3.1 Random Assignment of Peers
The cleanest solution to the reflection problem is random assignment of individuals to peer groups. If group composition is orthogonal to individual characteristics, then peer ability (or any contextual variable) is a valid instrument for peer outcomes.
The most celebrated application of this strategy is Sacerdote [2001], who exploited the random assignment of first-year students to dormitory rooms at Dartmouth College. Because Dartmouth assigns roommates randomly (conditional on a few broad housing preferences), variation in roommate academic ability is plausibly orthogonal to student own-ability. Sacerdote found that a roommate's high-school GPA had a positive, significant effect on one's own GPA, and that assignment to a higher-GPA roommate increased own GPA by about 0.1 grade points— a modest but statistically robust peer effect.
Random assignment strategies have since been applied to military units [Carrell et al.,2013], Section 8 housing voucher recipients [Kling et al., 2007], and class assignments withinschools [Hoxby, 2000]. The key assumption in all cases is that the randomisation is credible—conditional on observable controls, peer assignment is as good as random.
3.2 Natural Experiments and Instrumental Variables
When randomisation is not feasible, researchers have sought instruments that shift the composition of peer groups exogenously. One influential strategy exploits exogenous changes in group membership over time or space.
Consider Angrist [2014], who provides a critical review of peer effects studies using class-size variation, class reshuffling, and college roommate designs. He argues that many IV strategies in the peer effects literature identify local average treatment effects (LATEs) for compliers whose group composition changed due to the instrument— populations that may not represent the broader policy-relevant group.
Hoxby [2000] uses idiosyncratic year-to-year variation in the number of students entering a grade within a school to identify the effect of class composition on academic outcomes, finding that the share of female students and minority students affects performance of their classmates. The key identification assumption is that annual cohort-size variation is unrelated to the characteristics of children entering school.
3.3 Network Structure and Exclusion Restrictions
A third approach to identification uses the structure of the social network itself to generate exclusion restrictions. Bramoullé et al. [2009] showed that in a network model where individuals interact with specific peers rather than a homogeneous group— the social network adjacency matrix G provides natural instruments.
The intuition is elegant: individual i is directly connected to her friends, who are in turn connected to their own friends (the friends-of-friends). These friends-of-friends affect i's outcome only through their effect on i's direct friends' outcomes and characteristics— they have no direct effect on i unless i also knows them. If friends-of-friends are not in i's direct network, they satisfy an exclusion restriction. they affect i only through their effect on i’s friends.
Formally, let G be the adjacency matrix, normalised so that Gᵢⱼ is 1/degree(i) if j is a friend of i and 0 otherwise. The structural model becomes:
where Gy is the vector of average peer outcomes and Gx is the vector of average peer characteristics. Bramoullé et al. [2009] show that identification requires that the matrices I, G, and G² are linearly independent. When the network is sufficiently sparse, G²x is an excluded instrument for Gy.
3.4 Goldsmith-Pinkham and Imbens
Goldsmith-Pinkham and Imbens [2013] extend the network IV approach by deriving efficient GMM estimators using multiple network-based instruments (Gx, G²x, ...). They also propose testing the overidentifying restrictions implied by the network structure. Their framework accommodates heterogeneous networks (varying degree distributions) and provides a unified treatment of the identification conditions in Bramoullé et al. [2009].
The empirical application using data on high-school friendships from the National Longitudinal Study of Adolescent Health (Add Health) finds modest endogenous peer effects on GPA and larger contextual effects from peers family backgrounds.
4 Correlated Effects and Selection Bias
A persistent concern is that apparent peer effects reflect selection into groups rather than causal influence. High-ability students may select into schools with other high-ability students, and if school quality is correlated with student quality, then peer ability and individual outcomes will be correlated even absent any causal effect.
The standard response is to include group (school, neighbourhood, dormitory) fixed effects, which absorb time-invariant unobservables common to all members of the group. But this strategy requires within-group variation in peer assignment— which brings us back to the random assignment and natural experiment approaches discussed above.
Heckman [1998] warns that even with random assignment, the linear-in-means model imposes strong functional form restrictions. Non-linear social interaction models (e.g., threshold models where only very high-achieving peers have positive effects) are not identified by the same instruments that identify the linear model. Researchers should be cautious about structural interpretations of reduced-form peer effect estimates.
5 The Causal Claims We Can and Cannot Make
Angrist [2014] draws a sharp distinction between design-based peer effect estimates and structural estimates. Even the cleanest random assignment design identifies a LATE: the causal effect of the specific peer composition shock induced by the randomisation, for the specific population whose peer composition was affected. Scaling this up to general policy— what would happen if we systematically improved the quality of peers throughout a school system— requires either a model or strong assumptions about how LATE extrapolates to ATE.
Another limitation is that most peer effects studies measure average effects, masking potentially important heterogeneity. A student of medium ability may benefit greatly from having high-achieving peers (who provide positive role models) but a student of very low ability may be ignored by high-achieving peers who form their own study groups. Evidence on such heterogeneity is limited but growing, with causal forests [Wager and Athey, 2018] offering a natural tool for estimating conditional average treatment effects in peer effect designs.
Figure 1: Causal pathways in the linear-in-means model. Endogenous effects (β) run frompeer outcomes to own outcome; contextual effects (δ) run from peer characteristics to ownoutcome; correlated effects arise from shared unobservables (U). Dashed arrows indicatepaths that generate the reflection problem.
6 Policy Implications
The distinction between endogenous and contextual peer effects has direct policy implications. If β > 0 (endogenous effects), there is a social multiplier: exogenous improvements inaverage group outcomes will generate further improvement through peer interactions, amplifying the direct policy effect. If only contextual effects exist (δ > 0, β = 0), there isno multiplier—improving a peer’s characteristics raises individual outcomes directly but notthrough behavioural contagion.
Manski [1993] showed that this distinction is not merely academic: policies designed toexploit the social multiplier (e.g., mixing high- and low-ability students in classrooms, orcreating racially integrated housing) will fail to achieve expected gains if what appears to beendogenous peer effects are actually contextual effects driven by peer characteristics ratherthan peer behaviour.
7 Conclusion
The reflection problem is one of the most elegant and consequential non-identification results in econometrics. It tells us that observational data from homogeneous groups cannot separate the three sources of social interactions. Progress has required randomized experiments, natural experiments, and creative use of network structure to generate exclusion restrictions.
What remains genuinely hard is extrapolation: even when we cleanly identify a peer effect in a specific context, translating that estimate into policy requires either a structural model or strong assumptions that local effects generalise globally. The reflection problem, in this sense, never entirely goes away— it merely retreats from the identification step to the interpretation step.
References
- Angrist, J. D. (2014). The perils of peer effects. Labour Economics, 30, 98-108.
- Bramoullé, Y., Djebbari, H., and Fortin, B. (2009). Identification of peer effects through social networks. Journal of Econometrics, 150(1), 41-55.
- Carrell, S. E., Sacerdote, B. I., and West, J. E. (2013). From natural variation to optimal policy? The importance of endogenous peer group formation. Econometrica, 81(3), 855-882.
- Goldsmith-Pinkham, P. and Imbens, G. W. (2013). Social networks and the identification of peer effects. Journal of Business & Economic Statistics, 31(3), 253-264.
- Heckman, J. J. (1998). Characterizing selection bias using experimental data. Econometrica, 66(5), 1017-1098.
- Hoxby, C. M. (2000). Peer effects in the classroom: Learning from gender and race variation. NBER Working Paper No. 7867.
- Kling, J. R., Liebman, J. B., and Katz, L. F. (2007). Experimental analysis of neighborhood effects. Econometrica, 75(1), 83-119.
- Manski, C. F. (1993). Identification of endogenous social effects: The reflection problem. Review of Economic Studies, 60(3), 531-542.
- Sacerdote, B. (2001). Peer effects with random assignment: Results for Dartmouth roommates. Quarterly Journal of Economics, 116(2), 681-704.
- Wager, S. and Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523), 1228-1242.