Beginner's Corner

Why Randomise? The Logic and Power of Randomised Controlled Trials

1 A Motivating Question

Suppose a government wants to know whether a new job training programme increases employment. The programme has been running for three years. The data show that 72% of participants are employed one year later, compared to 54% among non-participants. Can we conclude that the programme raised employment by 18 percentage points?

Almost certainly not. People who choose to enrol in job training differ systematically from those who do not: they may be more motivated, more educated, or from regions with better labour markets. The 18 percentage point gap reflects both the programme's causal effect and these pre-existing differences. Without more careful design, we cannot separate them.

This is the selection problem. It is the central challenge in empirical social science, and the randomised controlled trial (RCT) is the most direct solution to it.

2 The Fundamental Problem of Causal Inference

The causal effect of the training programme for individual i is τᵢ = Yᵢ(1) - Yᵢ(0): their employment status if they participate (Yᵢ(1)) minus their employment status if they do not (Yᵢ(0)). The problem is that we observe only one of these potential outcomes whichever one corresponds to their actual choice. We never observe the counterfactual.

This is the fundamental problem of causal inference [Holland, 1986]. Because we cannot observe both Yᵢ(0) and Yᵢ(1) for the same individual, we cannot compute τᵢ directly. Instead, we seek to estimate an average:

$$\tau_{\text{ATE}} = \mathbb{E}[Y_i(1) - Y_i(0)] = \mathbb{E}[Y_i(1)] - \mathbb{E}[Y_i(0)]. \tag{1}$$

3 Why Random Assignment Solves the Problem

If we randomly assign individuals to treatment (Dᵢ = 1, enrol in training) and control (Dᵢ = 0, do not enrol), something powerful happens: the assigned group is independent of the potential outcomes. In notation:

$$(Y_i(0), Y_i(1)) \perp\!\!\!\perp D_i. \tag{2}$$

Why? Because a coin flip is unrelated to motivation, education, or anything else about the individual. This independence is the key to causal identification.

\begin{align} \mathbb{E}[Y_i(1)] &= \mathbb{E}[Y_i(1)|D_i = 1] = \mathbb{E}[Y_i|D_i = 1], \tag{3} \\ \mathbb{E}[Y_i(0)] &= \mathbb{E}[Y_i(0)|D_i = 0] = \mathbb{E}[Y_i|D_i = 0]. \tag{4} \end{align}

Equation (3) holds because random assignment ensures the treated group is a representative sample of the full population in terms of potential outcomes. The last equality uses the fact that for the treated group, observed Yᵢ equals Yᵢ(1).

Subtracting (4) from (3):

$$\tau_{\text{ATE}} = \mathbb{E}[Y_i | D_i = 1] - \mathbb{E}[Y_i | D_i = 0]. \tag{5}$$

The average treatment effect equals the simple difference in means between the treated and control groups. This is a quantity we can compute directly from the data. This is why randomisation works.

4 A Concrete Example

Return to the job training programme. In a randomised trial, 200 applicants are randomly assigned: 100 to training, 100 to a waitlist (control). After one year:

Group Enrolled Employed after 1 year
Treatment 100 68 (68%)
Control 100 54 (54%)
Difference 14 pp
    The 14 percentage point gap is now a causal estimate. Because assignment was random, the two groups were similar on average before the programme (in expectation). The post-programme difference can only reflect the programme's causal effect.

Note that the 14 pp estimate from the RCT differs from the 18 pp from the observational comparison. The gap of 4 pp was selection bias: self-selected participants were more employable to begin with. The RCT removes this bias.

5 Covariate Balance: Checking Randomisation

Even when assignment is truly random, random samples can be imbalanced by chance—especially in small samples. It is standard practice to report a balance table: a comparison of pre-determined characteristics (age, education, gender, pre-programme earnings) across treatment and control groups. Large and systematic imbalances suggest a randomisation failure.

Formally, for each pre-determined covariate Xᵢ:

$$\mathbb{E}[X_i | D_i = 1] = \mathbb{E}[X_i | D_i = 0], \tag{6}$$

in expectation. Statistical tests (t-tests or F-tests for joint significance of all covariates) help detect chance imbalance. If imbalance is detected, it can be corrected by regression adjustment [Lin, 2013].

6 Intention to Treat versus Local Average Treatment Effect

A practical complication: not everyone assigned to treatment actually takes it (non-compliance). In the training example, some assigned to the treatment group might not attend. In this case, the simple difference in means estimates the Intention to Treat (ITT) effect: the causal effect of the offer of training, not of training itself. The ITT is:

$$\tau_{\text{ITT}} = \mathbb{E}[Y_i | D_i^{\text{assigned}} = 1] - \mathbb{E}[Y_i | D_i^{\text{assigned}} = 0]. \tag{7}$$

If we want the effect of taking the training (the Local Average Treatment Effect, or LATE), we use the randomised assignment as an instrument for actual participation and compute a two-stage least squares estimate [Angrist and Imbens, 1994]. The LATE applies to compliers—those who take training if and only if they are assigned to it.

The relationship is:

$$\tau_{\text{LATE}} = \frac{\tau_{\text{ITT}}}{\text{Pr}(D_i^{\text{actual}} = 1 | D_i^{\text{assigned}} = 1) - \text{Pr}(D_i^{\text{actual}} = 1 | D_i^{\text{assigned}} = 0)}. \tag{8}$$

The denominator is the first-stage "take-up" difference induced by assignment. If 80% of those assigned to treatment actually enrol (and 0% of those assigned to control), the LATE is 14 / 0.80 = 17.5 percentage points.

7 Power and Sample Size

An RCT that is too small may fail to detect a real effect. Statistical power is the probability of rejecting the null of no effect when the true effect is τ ≠ 0. For a two-sided test at level α with equal group sizes n/2:

$$\text{Power} = \Phi \left( \frac{|\tau|}{2\sigma/\sqrt{n}} - z_{1-\alpha/2} \right), \tag{9}$$

where σ² is the outcome variance and z12 is the critical value (1.96 for α = 0.05). To achieve 80% power for a standardised effect size of 0.2 (a moderate effect), a rule of thumb requires approximately n ≈ 400 total participants. Power is increased by reducing outcome variance (through covariate adjustment) or increasing the sample size.

8 Common Mistakes

  1. Confusing ITT with LATE. If compliance is imperfect, the ITT underestimates the effect on treated individuals. Report both clearly.
  2. Post-randomisation conditioning. Analysing only participants who "completed" the programme conditions on a post-treatment variable and reintroduces selection bias.
  3. Underpowered trials. A null result from an underpowered study is uninformative. Conduct a power analysis before the trial.
  4. Ignoring clustering. When randomisation is at the group level (classrooms, villages) but outcomes are measured at the individual level, standard errors must account for within-group correlation [Bertrand et al., 2004].
  5. Multiple testing. Testing many outcomes or subgroups inflates the false discovery rate. Pre-register the primary outcome and correct for multiple comparisons in secondary analyses.

9 Where to Learn More

  • Gerber and Green [2012] provide a comprehensive introduction to field experiments in political science and public policy.
  • Imbens and Rubin [2015] develop the potential outcomes framework and randomisation inference in depth.
  • Angrist and Pischke [2009] cover RCTs, IV, and the LATE framework in an applied econometrics context.

10 Conclusion

Randomisation solves the selection problem by making treatment assignment independent of potential outcomes, so that the difference in means between treated and control groups equals the average treatment effect. This insight underpins the credibility revolution in econometrics and development economics. Understanding why randomisation works and what can go wrong when it does not is the foundation for evaluating any causal claim.

References

  1. Angrist, J. D. and Imbens, G. W. (1994). Identification and estimation of local average treatment effects. Econometrica, 62(2):467-475.
  2. Angrist, J. D. and Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion. Princeton University Press.
  3. Bertrand, M., Duflo, E., and Mullainathan, S. (2004). How much should we trust differences-in-differences estimates? Quarterly Journal of Economics, 119(1):249-275.
  4. Gerber, A. S. and Green, D. P. (2012). Field Experiments: Design, Analysis, and Interpretation. W.W. Norton & Company.
  5. Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81(396):945-960.
  6. Imbens, G. W. and Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press[cite: 2].
  7. Lin, W. (2013). Agnostic notes on regression adjustments to experimental data: reexamining Freedman's critique. Annals of Applied Statistics, 7(1):295-318[cite: 2].

Continue Reading

Browse All Sections →
Home
This is some text inside of a div block.
This is some text inside of a div block.
This is some text inside of a div block.

Article Title