When AI systems get less and less accurate with longer real-world use

Volume 8, Issue 18 | October 26, 2023

and

Oct 27, 2023

In This Issue

NBA Bubble Data: Second COVID infections shorter than first
COVID viral load peaking later, with implications for rapid tests
Testing Playbook for Biological Emergencies released
Pan-cancer screening may not be ready yet

New and Noteworthy

The NBA Says: Shot Clock Ticks Differently for 2nd COVID Infection

The 2024 NBA season just began - so it is a perfect time to look at the important role that the NBA’s testing bubble continues to play in furthering our understanding of COVID. Early in the pandemic, the National Basketball Association instituted a regular testing regime so that games could be played more safely. Only players and staff who tested PCR-negative immediately prior to games were allowed to play or be present at the events.

Of the 94,812 samples collected by the NBA, 3,346 diagnosed infections, mostly early Omicron variants. This was the first high-quality data that showed how fit and healthy individuals encountered COVID from late 2020 through mid-2022.

A recent report in Nature Communications used this same data set to find out how an individual’s first infection compared with a second. (One caveat: Most first infections happened before vaccination was available, while most second infections occurred after players had been vaccinated, so it is impossible to separate their relative contributions to immunity.) In the end, there were few second infections - just 71 individuals had both a first and second infection within the rigorously managed NBA program (an additional 122 second infections were added from outside records).

While slightly lower than it was for first infections, the peak viral load for second infections was generally similar, implying that patients were a) just as sick, and b) equally infectious at peak. The big difference: Second infections were 3.5 days (25%) shorter overall, with most of that due to clearing the virus 2.7 days (29%) more quickly. Given that person-to-person transmission is more likely when viral load is higher than 10^4.5copies/ml or so, that means the contagious period was cut from just over a week to about 5 days.

Commentary: This is good news, but it’s not all that previous infection / vaccination has changed about viral kinetics. Keep reading. . . .

Negative rapid test on Day 1 of symptoms doesn’t mean what it used to

One of the main reasons why COVID became a pandemic was the virus’s ability to be spread by people who were infected but didn’t yet have symptoms. By the time people did break with symptoms, they were at their peak viral load - and highly likely to test positive on a rapid antigen test. According to a recent article in Clinical Infectious Diseases, that’s no longer the case.

Between April 2022 and April 2023, the researchers looked at viral load relative to symptom onset in 348 COVID-positive, symptomatic patients. Median SARS-CoV-2 viral loads didn’t peak until the fourth or fifth day of symptoms. Based on Ct counts, the researchers estimated that “rapid antigen test sensitivity was 30.0% to 60.0% on the first day, 59.2% to 74.8% on the third, and 80.0% to 93.3% on the fourth.”

Commentary: There’s good news and bad news here. On the one hand, this (admittedly small) study suggests that rapid antigen tests aren’t as useful as they once were for diagnosing an infection in its early stages. However, it also suggests that asymptomatic spread may not be the issue it once was. That said, people need to act responsibly when they first have symptoms - by masking in public places or (when possible) just staying home - until they’re sure they don’t have COVID. Otherwise, we’ve just traded one problem for another.

Testing Playbook for Biological Emergencies Published

Brown University’s School of Public Health (Pandemic Center and STAT Health Network) and the Association of Public Health Laboratories have released a new publication: Testing Playbook for Biological Emergencies. Rather than pretending to already know exactly how to respond to every possible kind of biological emergency (an impossible task), the playbook instead aims to provide the questions that government and other decision makers need to ask about testing at each stage of such an emergency. These stages are grouped into six sequential time periods, from first identification of a pathogen through the endemic phase as the emergency is under control.

The two core principles behind the document are:

Every part of the diagnostics landscape needs to be included in response, including public health labs, commercial labs both small and large, academic medical centers, hospital labs, and test manufacturers.
We need to act now, during disease “peacetime,” to be ready for a future crisis.

Other major recommendations:

Establish a permanent National Testing Lead within the White House now.
Establish a sustained (federal) Testing Readiness Commission now, as well as a network for regular testing operational discussions among state, local, tribal, and territorial (SLTT) governments and federal operational officials responsible for testing.
Prioritize testing readiness within pre-existing emergency funding mechanisms.
Make quality testing data accessible and useful to the American people, including by expanding wastewater surveillance.
Purchase standing federal testing capacity with designated commercial laboratories, academic medical centers, and test manufacturers.
Establish a permanent program for moving tests into communities quickly during health emergencies and seasonal outbreaks, to enhance awareness, choice, and equitable access.

Commentary: No one knows when it will occur or what it will look like - but another biological crisis will come. Such uncompromising focus on testing is a central part of the preparation and response to a biological emergency. Note: Mara was one of the co-authors of this playbook.

Food for Thought

Pan-cancer screening: A bridge still too far?

The threat of hidden cancer is a common worry, and most believe that the earlier it is caught the better. A number of pan-cancer tests have been launched to address these concerns - but so far, evidence of real benefit is limited. Cancer testing comes in different flavors, all the initial molecular tests were designed to clarify specific tissue/mutation subsets in depth, most of these extended their breadth (more sub-types - other tissues, other mutations) and depth (combining mutations with expression, proteomic, metabolomic features prevalent in cancer). Now we have pan-cancer liquid biopsy tests, which are extending themselves in multi-cancer early detection (MCED), and this is where the screening issues we have discussed before are most pronounced:

When you test mostly healthy people, even very, very good tests will generate more false than true positives, causing unnecessary anxiety and follow-up.
Clinical trials for screening tests must be huge (and expensive) to give useful results, but they still only yield very small numbers of positives that are often too small to draw reliable conclusions.

The Lancet published the results of the PATHFINDER trial of just such a pan-cancer test. 6,621 volunteers were enrolled and tested, then 6,413 were successfully tracked over the next 12 months, during which 121 confirmed cancers were found - a disease prevalence of 1.9% in the sample. This test had very good specificity (99.1%) and negative predictive value (98.6%; i.e., almost all of those who tested negative were cancer-free - but then again, that’s true in life). However, 57 individuals were told they were positive when they were not (false positives), and, more worrisomely, 71% of cancer cases were missed (false negatives). Those numbers may seem wonky, but look at the graphic below and you’ll see why they’re right.

Commentary: The good news, if you get a negative result on this test, you can be 99% sure you do not have any of these cancers. The bad news is that this test found only 20.4% of solid tumors, including only 10% of prostate cancers and 23% of breast cancers - both of which are reliably detected early from symptoms and existing techniques. Conducted during COVID, reduced routine screening may have meant that 12 months was too short of a follow-up time. Reliance on volunteers may also have tilted the sample towards patients with symptoms consistent with cancer, who considered this an easier first step in the diagnostic pipeline. Larger clinical trials are underway, but for the time being, the health benefits of pan-cancer screening of an otherwise asymptomatic healthy population remain elusive.

The more they are used the worse they get. How to keep clinical predictive AI systems relevant?

All AI models are created by looking in the rearview mirror: What combinations of features were probabilistically associated with outcomes in the past? Once AI systems are deployed, local practice patterns and patient characteristics (both of which are AI inputs) differ from site to site and change over time, requiring updated input/output relationships. Absent updating, AI systems become less and less effective.

A recent paper demonstrates that effective updating is almost impossible to do in practice. Each system interacts with a health record independently in ways that are a black box for the users, driving system-versus-system conflict. The consequence, researchers found, was a fast-growing number of false alarms.

Commentary: AI “data set drift” compounds the difficulty of a perennial diagnostic design issue: There is a natural wish to keep sensitivity high (to not miss any cases), but this inevitably increases false positives. When that happens in the ICU, urgent-care physicians get alarm fatigue, negating the value of the system altogether. AI promises to be a godsend in the clinic one day, but we are not there yet.

Quick Hits

Our focus is diagnostics - COVID and beyond. But this week we are going beyond diagnostics and had to share this. Makers of the 1 Virus Buster Invisible Mask (aka the 1 Virus Buster Card) claimed that wearing their product (a little card you clip to your shirt) gave users an invisible three-foot barrier against 99.99% of all viruses and bacteria. Not surprisingly, the FTC has laid the smack down, and is working to get these people banned from making any more health-related claims without evidence. One defendant has settled with the agency; the other is still fighting.

1 Comment

The Diagnostic Detective

Thanks Mara- an educational read as ever.

Just a point of clarification on the PATHFINDER study and the GRAIL technology. To be clear to patients, the probability of having cancer simply by entering the study was 1.9%. When the test is negative, that reduces the probability to 1.4%. If I was a patient in the study ( I am over 50 with no symptoms so could be) this wouldn't really reassure me at all.

If I have a positive test my chances of having cancer rise from 1.9% to 38%. Personally, I would be happy with that but on a population level, a large number of people will be told 'their cancer test is positive' causing anxiety and extra tests, even though there is no cancer.

I don't see a role for the GRAIL test in cancer screening and would certainly not invest in the company. They see this as a 'game changer' in cancer screening but it certainly is not that. It may prove to have a role in patients with symptoms suggestive of cancer and studies are ongoing in that field.

Expand full comment

Sensitive & Specific: The Testing Newsletter

When AI systems get less and less accurate with longer real-world use

Volume 8, Issue 18 | October 26, 2023

New and Noteworthy

Food for Thought

Quick Hits