The Causal Review

1. Online Advertising Effectiveness: A Large-Scale RCT at eBay

Citation: Blake et al. [2015]. Econometrica 83(1):155-174, 2015.

Research question: Does paid search advertising on Google causally increase sales for eBay? The question matters because observational correlations between ad spend and sales are severely confounded—advertisers spend more when sales are high, and users who click on ads may have bought anyway.

Identification strategy: A large-scale randomised field experiment. eBay turned off paid search advertising for a random subset of US geographic markets (designated market areas) for several weeks, creating clean treated (no ads) and control (ads running) markets. With over 200 markets and millions of transactions, the study has suﬃcient power to detect small eﬀects.

Key result: For eBay's existing customers and frequent users, paid search advertising has essentially zero causal eﬀect on purchases. Users who were going to buy on eBay anyway found it through organic search results when paid ads were removed. The only group for whom advertising had a positive causal eﬀect was new users with low prior purchase rates. The implied return on investment for most of eBay's search advertising was negative.

Takeaway: Inframarginal customers—those who would have purchased regardless—constitute a large fraction of the ad-click population, making observational ad-eﬀectiveness estimates systematically too optimistic. This paper is a landmark in the application of large-scale RCTs by technology companies and has spawned a literature on "incrementality testing" in digital advertising.

2. Generative AI at Work: Productivity Effects in Customer Service

Citation: Brynjolfsson et al. [2023]. NBER Working Paper 31161, 2023.

Research question: What is the causal eﬀect of access to a generative AI assistant on the productivity of customer service workers?

Identification strategy: A staggered rollout of an AI assistant tool across agents at a large technology company's customer service centre. The staggered adoption creates a DiD design: agents receive access to the tool at diﬀerent times, and outcomes are measured in terms of issues resolved per hour. Importantly, the assignment of which agents received the tool first was quasi-random—determined by management decisions that were plausibly orthogonal to individual agent performance trends.

Key result: Access to the AI assistant increased the number of customer issues resolved per hour by 14% on average. The eﬀect was highly heterogeneous: new and low-skill workers experienced gains of 35%, while high-skill workers saw minimal or no improvements. The AI tool appeared to function by delivering expertise and "best practices" to less experienced workers, compressing the skill distribution.

Takeaway: Generative AI appears to be a skill-complementary technology for low-skill workers and approximately neutral for high-skill workers—the opposite of the "robots replace workers" narrative. This heterogeneity has important implications for the distributional eﬀects of AI diﬀusion in the labour market.

3. Consumer Surplus from the Digital Economy: Willingness-to-Accept Experiments

Citation: Brynjolfsson et al. [2019]. American Economic Review: Papers & Proceedings 109:212-216, 2019.

Research question: How much consumer surplus do free digital goods (Facebook, email, maps) provide? Standard GDP accounting assigns zero value to services provided for free, potentially underestimating the welfare gains from digital goods.

Identification strategy: Incentive-compatible willingness-to-accept (WTA) experiments. Participants are randomised into receiving payment to deactivate Facebook for a month (treatment) or a control condition. The minimum payment required to induce deactivation reveals the consumer surplus—the compensating variation for access to the platform.

Key result: The median WTA to deactivate Facebook for one month is approximately $40-$60. Extrapolating across the US user base implies annual consumer surplus from Facebook alone of tens of billions of dollars. Similar estimates for Google Search yield even larger values. These welfare gains are not captured in GDP statistics.

Takeaway: GDP mismeasurement from the free digital economy is substantial and methodologically tractable through WTA experiments. The paper opens a causal welfare measurement agenda that complements the quality-adjustment agenda in price index research.

4. Large Language Models as Simulated Economic Agents

Citation: Horton [2023]. NBER Working Paper 31122, 2023.

Research question: Can large language models (LLMs) serve as "simulated economic agents" that replicate human responses in economic experiments—providing low-cost pretesting of experimental designs?

Identification strategy: The paper is methodological rather than identifying a specific causal eﬀect. Horton [2023] runs a set of well-known economic experiments (dictator games, ultimatum games, labour supply) with GPT-4 acting as a human subject. He compares LLM responses to established human experimental findings.

Key result: LLM responses to economic experiments are broadly consistent with human behaviour in many classic paradigms—GPT-4 exhibits inequality aversion in dictator games, backward induction in ultimatum games, and labour supply responses consistent with income and substitution eﬀects. When given explicit socioeconomic background information ("you are a low-income single parent"), LLM responses shift in the expected direction.

Takeaway: LLMs as "silicon subjects" can serve as a rapid, low-cost tool for pre-testing experimental designs and generating directional predictions before committing to expensive human subject recruitment. The paper does not claim that LLMs replicate humans perfectly, but that the correlation between LLM and human responses is high enough to make them useful for experimental design. Important caveats: LLMs may reflect published experimental findings in their training data, making apparent replication tautological.

5. Measuring the Welfare Effects of Platform Algorithms: Evidence from a Field Experiment

Citation: Allcott et al. [2020]. American Economic Review 110(12):3830-3880, 2020.

Research question: What are the welfare eﬀects of social media (Facebook) use—specifically, does deactivating Facebook improve user subjective well-being, and do users underestimate how much time they spend on it?

Identification strategy: A large randomised controlled trial. In the four weeks before the 2018 US midterm elections, 2,844 Facebook users were randomised into deactivation (treatment) or continued use (control). The deactivation was enforced via a $102 payment conditional on survey-verified abstention.

Key result: Deactivating Facebook for four weeks: (1) reduced online activity and political polarisation; (2) increased subjective well-being (happiness and life satisfaction) by about 0.09 standard deviations—a meaningful eﬀect; (3) increased traditional TV consumption and socialising with friends and family; (4) reduced factual news knowledge but also reduced exposure to misinformation.

Takeaway: Facebook use appears to reduce subjective well-being for the average user, and users substantially underestimate their own usage (by about 30%).

The papers reviewed above span advertising eﬀectiveness (eBay RCT), AI productivity (generative AI in customer service), digital welfare measurement (WTA experiments), computational social science (LLM agents), and platform welfare (Facebook deactivation RCT). Together, they illustrate the breadth of causal methods being applied at the frontier of platform economics.

References

Allcott, H., Braghieri, L., Eichmeyer, S., and Gentzkow, M. (2020). The welfare eﬀects of social media. American Economic Review, 110(12):3830-3880.
Blake, T., Nosko, C., and Tadelis, S. (2015). Consumer heterogeneity and paid search eﬀectiveness: A large-scale field experiment. Econometrica, 83(1):155-174.
Brynjolfsson, E., Collis, A., Diewert, W. E., Eggers, F., and Fox, K. J. (2019). GDP-B: Accounting for the social value of free goods in the digital economy. American Economic Review: Papers & Proceedings, 109:212-216.
Brynjolfsson, E., Li, D., and Raymond, L. R. (2023). Generative AI at work. NBER Working Paper 31161.
Horton, J. J. (2023). Large language models as simulated economic agents: What can we learn from homo silicus? NBER Working Paper 31122.

Recent Results: Platform Economics, AI, and Causal Estimation (2023-2025)

1. Online Advertising Effectiveness: A Large-Scale RCT at eBay

2. Generative AI at Work: Productivity Effects in Customer Service

3. Consumer Surplus from the Digital Economy: Willingness-to-Accept Experiments

4. Large Language Models as Simulated Economic Agents

5. Measuring the Welfare Effects of Platform Algorithms: Evidence from a Field Experiment

References

Continue Reading

The ivmte Package in R: Marginal Treatment Effects and Bounding Policy-Relevant Parameters

The contdid Package in R: Estimating Dose-Response Functions with Continuous Treatments

Recent Results: Housing Markets, Rent Control, and Urban Economics

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Article Title

Recent Results: Platform Economics, AI, and Causal Estimation (2023-2025)

1. Online Advertising Effectiveness: A Large-Scale RCT at eBay

2. Generative AI at Work: Productivity Effects in Customer Service

3. Consumer Surplus from the Digital Economy: Willingness-to-Accept Experiments

4. Large Language Models as Simulated Economic Agents

5. Measuring the Welfare Effects of Platform Algorithms: Evidence from a Field Experiment

References

Continue Reading

The ivmte Package in R: Marginal Treatment Effects and Bounding Policy-Relevant Parameters

The contdid Package in R: Estimating Dose-Response Functions with Continuous Treatments

Recent Results: Housing Markets, Rent Control, and Urban Economics

Natural Experiments: Finding Causal Evidence Without Randomisation

Regression Discontinuity Design: Sharp, Fuzzy, and the CCT Bandwidth

The Credibility Revolution in Econometrics: Thirty Years of Causal Inference

Stay current with causal inference

Article Title