Beginner's Corner

What Is a Counterfactual?

The Central Question

Suppose you want to know whether a job training programme increases participants' earnings. You observe that programme participants earn, on average, 5Z2,000?

Not necessarily. People who choose to enrol in a job training programme may already be more motivated, more educated, or in better economic circumstances than those who do not enrol. Their higher earnings might reflect those pre-existing advantages, not the effect of the programme. To know the programme's causal effect, you would need to compare each participant's actual earnings to what they would have earned had they not participated. That hypothetical — what would have happened in a world that did not occur — is a counterfactual.

The Potential Outcomes Framework

The potential outcomes framework, introduced by Rubin(1974) and drawing on earlier work by Neyman (1923), provides a formal language for counterfactuals. For each individual i and each possible treatment value, we define a potential outcome: the value of the outcome that would be observed if the individual received that treatment.

In the simplest binary treatment case:

  • Yi(1): the earnings of person i if they participate in the training programme.
  • Yi(0): the earnings of person i if they do not participate in the training programme.

The causal effect of the programme for person i is:

$$ \tau_i = Y_i(1) - Y_i(0) $$

This is the difference between what would happen in the treated world and what would happen in the untreated world, for the same person. It is a comparison of two potential outcomes.

The Fundamental Problem

Here is the central difficulty: at any given point in time, each person either participates in the programme or does not. We observe Yi(1) for participants and Yi(0) for non-participants, but never both for the same person at the same time. The individual treatment effect \(\tau_i\) is therefore unobservable.

This is the fundamental problem of causal inference (Holland(1986)): we can never directly observe a counterfactual, because it refers to a world that did not happen.

The observed outcome for person i with treatment status (Di in {0,1} is:

$$ Y_i = D_i \cdot Y_i(1) + (1 - D_i) \cdot Y_i(0) $$

We observe either Yi(1) (if (Di = 1 or Yi(0) if Di = 0, but not both.

$$ \text{ATE} = \frac{1}{10} \sum_{i=1}^{10} \tau_i = \frac{5000 + 3000 + 6000 + 4000 + 4000 + 4000 + 5000 + 4000 + 5000 + 4000}{10} = \frac{44,000}{10} = 4,400 $$

A Numerical Example

Consider a small job training programme with five participants and five non-participants. Table  shows both potential outcomes for each person (in reality, only one column is observed per person).

Table 1: Potential Outcomes for Ten Individuals (Hypothetical)
Person Di Yi(0) Yi(1) τi = Yi(1) − Yi(0)
Alice 1 28,000 33,000 +5,000
Bob 1 32,000 35,000 +3,000
Carol 1 25,000 31,000 +6,000
Dave 1 30,000 34,000 +4,000
Eve 1 26,000 30,000 +4,000
Frank 0 20,000 24,000 +4,000
Grace 0 18,000 23,000 +5,000
Henry 0 22,000 26,000 +4,000
Iris 0 19,000 24,000 +5,000
Jack 0 21,000 25,000 +4,000

pants only — is:

$$ \text{ATT} = \frac{1}{5} \sum_{i:D_i=1} \tau_i = \frac{5000 + 3000 + 6000 + 4000 + 4000}{5} = \frac{22,000}{5} = 4,400 $$

      In this example, ATE = ATT = $4,400. The programme raises earnings by $4,400on average.

What We Actually Observe

In practice, we observe Yi(1) for participants (Alice through Eve) and Yi(0) for non-participants (Frank through Jack). The naive comparison of means is:

$$ \bar{Y}_{\text{treated}} = \frac{33000 + 35000 + 31000 + 34000 + 30000}{5} = 32,600 $$
$$ \bar{Y}_{\text{control}} = \frac{20000 + 18000 + 22000 + 19000 + 21000}{5} = 20,000 $$
$$ \hat{\tau}_{\text{naive}} = 32,600 - 20,000 = 12,600 $$

  The naive estimate is $12,600 — nearly three times the true effect! The biasarises because participants already had higher earnings potential than non-participantseven before the programme: their Yi(0) values (28–32k) are much higher than nonparticipants’ Yi(0) values (18–22k). This is selection bias

     Formally, the naive comparison estimates:

$$ \hat{\tau}_{\text{naive}} = \underbrace{\bar{Y}(1) - \bar{Y}(0)}_{\text{observed difference}} = \underbrace{\text{ATT}}_{\text{true effect}} + \underbrace{(\bar{Y}_{\text{treated}}(0) - \bar{Y}_{\text{control}}(0))}_{\text{selection bias}} $$

How Do We Solve the Problem?

The fundamental problem of causal inference means that τi can never be directly observed. But population-level summaries like the ATE or ATT can be identified —that is, expressed as functions of the observable data distribution — under additionalassumptions. Three main approaches exist:

  Randomisation. If treatment is randomly assigned, then Di is independent of(Yi(0), Yi(1)). This means the control group provides a valid counterfactual for thetreatment group: E[Yi(0) | Di = 1] = E[Yi(0) | Di = 0], so selection bias is zero.Randomised controlled trials (RCTs) are the “gold standard” for this reason.

  Conditional independence (selection on observables). If treatment assignment is “as good as random”conditional on observed covariates Xi — that is, (Yi(0),Yi(1)) ⊥Di | Xi — then within cells defined by Xi, there is no selection bias. This is the “unconfoundedness” assumption (Rosenbaum and Rubin(1983)). The propensity scorep(Xi) = Pr(Di = 1 | Xi) can be used to reweight the sample and recover the ATE(Rosenbaum and Rubin, 1983).

Natural experiments. When an external factor (an instrument, a policy discontinuity, a lottery) creates quasi-random variation in treatment, we can exploit this variation to estimate causal effects without directly observing the counterfactual. This is the logic of instrumental variables, regression discontinuity, and difference-in-differences.

Why This Matters

The potential outcomes framework is not just a mathematical formalism — it is a way of thinking about causality that clarifies what questions are answerable and what assumptions are required. Before asking "does X cause Y?" you should ask: what is the counterfactual? For whom? Over what time horizon? Under what conditions?

These questions are often glossed over in casual empirical reasoning, leading to confused claims about causation. The framework forces precision: a causal claim is a statement about a comparison between two potential outcomes, and any evidence for that claim must somehow address the fundamental problem that only one of those outcomes is observed.

Conclusion

The counterfactual is the central object in causal inference. We want to know what would have happened in a world that did not occur. The fundamental problem is that this world is unobservable, so we must identify the causal effect from what we can observe by making assumptions. The most important assumption is that we have a valid comparison group — a group whose observed outcomes, after suitable adjustment, tell us what the treated group's outcomes would have been without treatment. All methods of causal inference — experiments, matching, IV, DiD, regression discontinuity — are strategies for finding or constructing such a comparison group.

References

  1. Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5):688--701.
  2. Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81(396):945--960.
  3. Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1):41--55.
  4. Imbens, G. W. and Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences. Cambridge University Press, Cambridge.
  5. Angrist, J. D. and Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion. Princeton University Press, Princeton, NJ.
  6. Pearl, J. (2009). Causality: Models, Reasoning, and Inference. 2nd edition. Cambridge University Press, Cambridge.

Continue Reading

Browse All Sections →
Home
This is some text inside of a div block.
This is some text inside of a div block.
This is some text inside of a div block.

Article Title