1. Online Advertising Effectiveness: A Large-Scale RCT at eBay
Citation: Blake et al. [2015]. Econometrica 83(1):155-174, 2015.
Research question: Does paid search advertising on Google causally increase sales for eBay? The question matters because observational correlations between ad spend and sales are severely confounded—advertisers spend more when sales are high, and users who click on ads may have bought anyway.
Identification strategy: A large-scale randomised field experiment. eBay turned off paid search advertising for a random subset of US geographic markets (designated market areas) for several weeks, creating clean treated (no ads) and control (ads running) markets. With over 200 markets and millions of transactions, the study has sufficient power to detect small effects.
Key result: For eBay's existing customers and frequent users, paid search advertising has essentially zero causal effect on purchases. Users who were going to buy on eBay anyway found it through organic search results when paid ads were removed. The only group for whom advertising had a positive causal effect was new users with low prior purchase rates. The implied return on investment for most of eBay's search advertising was negative.
Takeaway: Inframarginal customers—those who would have purchased regardless—constitute a large fraction of the ad-click population, making observational ad-effectiveness estimates systematically too optimistic. This paper is a landmark in the application of large-scale RCTs by technology companies and has spawned a literature on "incrementality testing" in digital advertising.
2. Generative AI at Work: Productivity Effects in Customer Service
Citation: Brynjolfsson et al. [2023]. NBER Working Paper 31161, 2023.
Research question: What is the causal effect of access to a generative AI assistant on the productivity of customer service workers?
Identification strategy: A staggered rollout of an AI assistant tool across agents at a large technology company's customer service centre. The staggered adoption creates a DiD design: agents receive access to the tool at different times, and outcomes are measured in terms of issues resolved per hour. Importantly, the assignment of which agents received the tool first was quasi-random—determined by management decisions that were plausibly orthogonal to individual agent performance trends.
Key result: Access to the AI assistant increased the number of customer issues resolved per hour by 14% on average. The effect was highly heterogeneous: new and low-skill workers experienced gains of 35%, while high-skill workers saw minimal or no improvements. The AI tool appeared to function by delivering expertise and "best practices" to less experienced workers, compressing the skill distribution.
Takeaway: Generative AI appears to be a skill-complementary technology for low-skill workers and approximately neutral for high-skill workers—the opposite of the "robots replace workers" narrative. This heterogeneity has important implications for the distributional effects of AI diffusion in the labour market.
3. Consumer Surplus from the Digital Economy: Willingness-to-Accept Experiments
Citation: Brynjolfsson et al. [2019]. American Economic Review: Papers & Proceedings 109:212-216, 2019.
Research question: How much consumer surplus do free digital goods (Facebook, email, maps) provide? Standard GDP accounting assigns zero value to services provided for free, potentially underestimating the welfare gains from digital goods.
Identification strategy: Incentive-compatible willingness-to-accept (WTA) experiments. Participants are randomised into receiving payment to deactivate Facebook for a month (treatment) or a control condition. The minimum payment required to induce deactivation reveals the consumer surplus—the compensating variation for access to the platform.
Key result: The median WTA to deactivate Facebook for one month is approximately $40-$60. Extrapolating across the US user base implies annual consumer surplus from Facebook alone of tens of billions of dollars. Similar estimates for Google Search yield even larger values. These welfare gains are not captured in GDP statistics.
Takeaway: GDP mismeasurement from the free digital economy is substantial and methodologically tractable through WTA experiments. The paper opens a causal welfare measurement agenda that complements the quality-adjustment agenda in price index research.
4. Large Language Models as Simulated Economic Agents
Citation: Horton [2023]. NBER Working Paper 31122, 2023.
Research question: Can large language models (LLMs) serve as "simulated economic agents" that replicate human responses in economic experiments—providing low-cost pretesting of experimental designs?
Identification strategy: The paper is methodological rather than identifying a specific causal effect. Horton [2023] runs a set of well-known economic experiments (dictator games, ultimatum games, labour supply) with GPT-4 acting as a human subject. He compares LLM responses to established human experimental findings.
Key result: LLM responses to economic experiments are broadly consistent with human behaviour in many classic paradigms—GPT-4 exhibits inequality aversion in dictator games, backward induction in ultimatum games, and labour supply responses consistent with income and substitution effects. When given explicit socioeconomic background information ("you are a low-income single parent"), LLM responses shift in the expected direction.
Takeaway: LLMs as "silicon subjects" can serve as a rapid, low-cost tool for pre-testing experimental designs and generating directional predictions before committing to expensive human subject recruitment. The paper does not claim that LLMs replicate humans perfectly, but that the correlation between LLM and human responses is high enough to make them useful for experimental design. Important caveats: LLMs may reflect published experimental findings in their training data, making apparent replication tautological.
5. Measuring the Welfare Effects of Platform Algorithms: Evidence from a Field Experiment
Citation: Allcott et al. [2020]. American Economic Review 110(12):3830-3880, 2020.
Research question: What are the welfare effects of social media (Facebook) use—specifically, does deactivating Facebook improve user subjective well-being, and do users underestimate how much time they spend on it?
Identification strategy: A large randomised controlled trial. In the four weeks before the 2018 US midterm elections, 2,844 Facebook users were randomised into deactivation (treatment) or continued use (control). The deactivation was enforced via a $102 payment conditional on survey-verified abstention.
Key result: Deactivating Facebook for four weeks: (1) reduced online activity and political polarisation; (2) increased subjective well-being (happiness and life satisfaction) by about 0.09 standard deviations—a meaningful effect; (3) increased traditional TV consumption and socialising with friends and family; (4) reduced factual news knowledge but also reduced exposure to misinformation.
Takeaway: Facebook use appears to reduce subjective well-being for the average user, and users substantially underestimate their own usage (by about 30%).
The papers reviewed above span advertising effectiveness (eBay RCT), AI productivity (generative AI in customer service), digital welfare measurement (WTA experiments), computational social science (LLM agents), and platform welfare (Facebook deactivation RCT). Together, they illustrate the breadth of causal methods being applied at the frontier of platform economics.
References
- Allcott, H., Braghieri, L., Eichmeyer, S., and Gentzkow, M. (2020). The welfare effects of social media. American Economic Review, 110(12):3830-3880.
- Blake, T., Nosko, C., and Tadelis, S. (2015). Consumer heterogeneity and paid search effectiveness: A large-scale field experiment. Econometrica, 83(1):155-174.
- Brynjolfsson, E., Collis, A., Diewert, W. E., Eggers, F., and Fox, K. J. (2019). GDP-B: Accounting for the social value of free goods in the digital economy. American Economic Review: Papers & Proceedings, 109:212-216.
- Brynjolfsson, E., Li, D., and Raymond, L. R. (2023). Generative AI at work. NBER Working Paper 31161.
- Horton, J. J. (2023). Large language models as simulated economic agents: What can we learn from homo silicus? NBER Working Paper 31122.