Many scientists worldwide are engaged in predicting the course of the COVID-19 pandemic, but the exact nature of this disease and the “novel” virus that causes it remains largely mysterious.
The numbers of confirmed cases in media reports are dependent on the extent of testing, which has varied markedly from region to region in North America. The scientific community has cautioned policymakers not to rely entirely on “observable” data (i.e., testing-confirmed COVID-19 cases) because such measures are likely to under-report the extent of the problem. That’s one reason why orthopaedic surgeon Mohit Bhandari, MD and his colleagues applied machine-learning tools to estimate the number of “unobserved” COVID-19 infections in North America.
The authors’ stated goal was to contribute to the ongoing debate on detection bias (one form of which can occur when outcomes—infections in this case—cannot be reliably counted) and to present statistical tools that could help improve the robustness of COVID-19 data. Their findings suggest that “we might be grossly underestimating COVID-19 infections in North America.”
The authors’ estimates relied on 2 sophisticated analyses: “dimensionality reduction” helped uncover hidden patterns, and a “hierarchical Bayesian estimator approach” inferred past infections from current fatalities. The dimensionality-reduction analysis presumed a 13-day lag time from infection to death, and it indicated that, as of April 22, 2020, the US probably had at least 1.3 million undetected infections, and the number of undetected infections in Canada could have ranged from 60,000 to 80,000. The Bayesian estimator approach yielded similar estimates: The US had up to 1.6 million undetected infections, and Canada had at least 60,000 to 86,000 undetected infections.
In contrast, data from the Johns Hopkins University Center for Systems Science and Engineering on April 22, 2020, reported only 840,476 and 41,650 confirmed cases for the US and Canada, respectively. Based on these numbers, as of April 22, 2020, the US may have had 1.5 to 2.02 times the number of reported infections, and Canada may have had 1.44 to 2.06 times the number of reported infections.
The authors emphasize that the “real” number of asymptomatic carriers cannot be determined without widespread use of validated antibody tests, which are scarce. Bhandari et al. conclude that policymakers should “be aware of the extent to which unobservable data—infections that have still not been captured by the system—can damage efforts to ‘flatten’ the pandemic’s curve.”