BACK TO RESEARCH

New Findings

Comparative Advantage of Humans versus AI in the Long Tail

Peer-reviewed Publication

Nikhil Agarwal, Ray Huang, Alex Moehring, Pranaj Rajpurkar, Tobias Salz, and Feiyang Yu

May 2024

Supervised machine learning algorithms use large labeled datasets to perform predictive tasks (see LeCun, Bengio, and Hinton 2015 for an early review). These algorithms have demonstrated superior performance compared to human experts in several key areas (Lai et al. 2021; Mullainathan and Obermeyer 2019). Many anticipate significant job displacements, especially in diagnostic radiology. A counterargument holds that the short-term risk of job displacement is limited because most jobs require several different tasks to be performed, not all of which are about prediction (see Agrawal, Gans, and Goldfarb 2019 and Langlotz 2019 for example).

The long-tail hypothesis holds that humans may remain relevant even within prediction domains, at least in the medium run, as humans can learn from relatively few examples (see Malaviya et al. 2022; Casler and Kelemen 2005). In radiology, Langlotz argues that humans will remain relevant because “radiologists know the ‘long tail”’ of diseases, each of which is uncommon but together relevant for a large proportion of patients (2019, 2). Similar arguments apply to other applications where artificial intelligence (AI) has made inroads. Autonomous cars, for instance, suffer from a “curse of rarity” because specific constellations are seldom encountered (Liu and Feng 2022, 1). Humans can overcome this curse by using knowledge outside the driving domain. Thus, the long-tail hypothesis holds that job displacement may be limited if a job requires performing several complementary or essential tasks that are hard to automate because the tasks are individually rarely encountered.

A challenge in studying the long-tail hypothesis is to find a class of similar problems that can be ordered by how commonly they are encountered. The authors argue that the interpretation of chest X-rays for various pathologies offers such a class. Here, disease prevalence can be used to parametrize the long tail, as rarer diseases will have fewer training examples.

This paper examines whether self-supervised learning algorithms—which learn broadly because they do not require structured labels—have diminished the advantage of human radiologists in diagnosing the long tail of diseases. Specifically, they compare the performance of CheXzero (Tiu et al. 2022), a zero-shot algorithm for diagnosing chest pathologies, to human radiologists across 79 diseases. They also compare these results to predictions from the CheXpert algorithm (Irvin et al. 2019), a traditional supervised deep learning algorithm capable of diagnosing 12 chest pathologies. To examine the hypothesis that humans will remain relevant in the long tail of diseases, the authors study how the performance of these classifiers varies with disease prevalence.