Thursday, April 2, 2020

"He (or She) Tested Negative (or Positive) for Coronavirus"

The news is full of stories of celebrities and public officials who tested positive or negative "for coronavirus" or "for COVID-19."
  • Prime Minister Boris Johnson has tested positive for coronavirus -- BBC News 3/27/20
  • Rapper Scarface revealed he has tested positive for coronavirus -- The Daily Beast 3/26/20
  • Rep. Mike Kelly (R-Pa.) announced Friday he has tested positive for the coronavirus -- The Hill, 3/27/20
  • An Arizona State University professor said he tested positive for COVID-19 -- KTAR 3/27/20
  • Trump tested negative for coronavirus -- CNN, 3/14/20
  • Charles Barkley announced he tested negative for the coronavirus -- USA Today, 3/23/20
  • Romney says he tested negative for coronavirus -- The Hill, 3/24/20
  • Lindsey Graham says he tested negative for coronavirus -- CNN, 3/15/20
  • Ayanna Pressley tests negative for COVID-19 -- CNN, 3/27/20
What can anyone really conclude from a negative or positive finding? How well do these findings answer the question of whether someone is infectious, or ill because of an infection? This posting seeks to explain why convincing estimates of test sensitivity and specificity are hard to come by. It also sketches the kind of additional reasoning that would be necessary to supply estimates of the probability a person is infected with the virus or ill from COVID-19 in light of the test results. (I am outside my comfort zone in parts of this posting -- corrections are welcome.)

Tests for What?

To begin with, we need to distinguish between the disease -- Coronavirus Disease 2019 (COVID-19) -- and the virus itself -- Sudden Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). The virus hijacks the machinery of human cells to replicate itself. Initially, it tends to reside in the mucous membranes of the upper nose and throat, but in more serious cases, it moves from the upper respiratory tract to the lungs. The disease spreads primarily through respiratory droplets from an infected individual that end up in the mouth, nose, or eyes of another person.

In a sense, the tests in the news are not tests for the disease -- even though many of their creators call them tests for COVID-19. \1/They are molecular diagnostics tests for the presence of certain sequences (of the nucleotide base-pairs) that are characteristic of SARS-CoV-2. \2/ If these sequences are detected in a swab from the person under investigation (a PUI), the test is said to be positive. If these sequences are not detected, the result is negative.

Operating Characteristics: Sensitivity and Specificity

In an ideal test for being infected with the virus, positives would only arise when SARS-CoV-2 is present in the PUI, and negatives would only occur when it is not. The probability of a positive result (+) given that the PUI harbors the specific SARS virus then would be 1. As we will see, this probability is not precisely known, but it surely is less than 1. If we let S2 stand for the event that the PUI has the virus SARS-Cov-2, we can write this conditional "true positive" probability, or test sensitivity, as Pr(+ | S2). The "|" in the expression is read "given" or "conditional on."

One other probability is needed to characterize the accuracy of the test. The specificity indicates how accurately the test indicates that a PUI does not harbor the virus. Ideally, the specificity, Pr(− | not-S2), also is  1. That is, whenever the PUI is not infected, the test is negative. But, once again, no real-world test for infection performs this well.

To see why, the following path diagram for test results may be helpful:
Figure 1. What might produce positive and negative test results

The diagram shows that a positive test result (TEST +) could be explained either by viruses from the PUI or by contamination on the swab. A negative test result (TEST −) could be explained either by the absence of any infection in the PUI, by an infection that has not generated enough viruses to signal a positive result, or by problems with the chemistry of the test. If any these paths have a nonzero probability, the sensitivity and specificity are less than 1.

Even this list of explanations assumes that "virus" in the diagram refers strictly to the SARS-CoV-2 strain that the test is designed to detect. If another type of coronavirus, a rhinovirus, parainfluenza virus, adenovirus, etc., has sufficient sequence similarity in the few regions tested to be mistaken for SARS-CoV-2, then the similar viruses on the swab could produce a signal. That would make the test even less specific to the infectious agent for COVID-19. Conversely, if other strains of SARS-CoV-2 exist and have sufficiently different sequences in the regions of the virus's genome that the test covers, the test will miss them, reducing its sensitivity. The FDA calls this aspect of sensitivity "inclusivity." \3/

So what are the sensitivity and specificity of the tests that have been released under emergency use authorizations from the FDA? The laboratories that rushed to develop the tests based on the viral genome performed limited experiments to assess (1) how much virus on the swab would be detectable; (2) whether other types of viruses would be detected instead of the real target; and (3) the probabilities implicit in the pathways in the blue boxes in the diagram.

Reported Laboratory Validation Data

To distribute or perform tests, manufacturers or laboratories must file validity studies with the Food and Drug Administration, which insists that "[a]ll clinical tests should be validated prior to use. In the context of a public health emergency, it is especially important that tests are validated as false results can have broad public health impact beyond that to the individual patient." \4/

For example, the Laboratory Corporation of America's Accelerated Emergency Use Authorization (EUA) Summary for its COVID-19 RT-PCR Test explains that "[t]he COVID-19 RT-PCR test is a real-time reverse transcription polymerase chain reaction (rRT-PCR) test for the qualitative detection of nucleic acid from SARS-CoV-2 in upper and lower respiratory specimens (such as nasopharyngeal or oropharyngeal swabs, sputum, lower respiratory tract aspirates, bronchoalveolar lavage, and nasopharyngeal wash/aspirate or nasal aspirate) collected from individuals suspected of COVID-19 by their healthcare provider."

The summary asserts that "SARS-CoV-2 RNA is generally detectable in respiratory specimens during the acute phase of infection" -- in other words, the test is somewhat sensitive to the disease. But that covers a lot of territory. Without data on the probabilities in the path from PUI infected to viruses in the speciment-collection site to a detectable quantity on the swab (and other possible paths), we are left with vague statements in the summary, such as "[p]ositive results are indicative of the presence of SARS-CoV-2 RNA" on the swab or other sample, and "[n]egative results do not preclude SARS-CoV-2 infection."

Of course, the test only is designed to signal the presence of viruses in the sample (the paths in the blue boxes in Figure 1). As I noted at the outset, it does not purport to be a test for the disease itself. How well does this test accomplish its more limited task? Naturally, detection of the virus depends on the quantity of viruses that are actually present. LabCorp and other test developers follow the simplistic approach of defining a fixed "Limit of Detection (LoD)." What, then, is the probability of detection at and above the limit?

According to the summary, "[t]he LoD study established the lowest concentration of SARS-CoV-2 (genome copies (cp)/μL) that can be detected by the COVID-19 RT-PCR test at least 95% of the time." This limit came from creating mock specimens with known quantities of the virus ("spiking the quantified live SARS-CoV-2 into negative respiratory clinical matrices") and reducing the quantity to the point at which 19 out of 20 specimens tested positive. Using only 20 mock specimens for each concentration, \5/ "[t]he study results showed that the LoD of the COVID-19 RT-PCR test is 6.25 cp/μL (19/20 positive)."

Although 19/20 describes the sample data, one cannot be entirely confident that the test really has a 95% sensitivity at the selected concentration for the LoD. Even if the true sensitivity at the 6.25 concentration were, say, 18 out of 20, we would find exactly 19 out of 20 replicates to be positive (as occurred in the LoD study) more than a quarter of the time. \6/ Likewise, the 0.95 sensitivity criterion for the limit of detection could have led to twice the reported LoD in a study with the same sample size. If the detection probability (sensitivity) were 95% at the next level up (12.5 cp/μL in this study), it could well be that all 20 replicates would be positive. The probability of that datum is 36% (0.9520 = 0.358).

Having chosen 6.25 for the LoD concentration, LabCorp proceeded to a "Clinical Evaluation." More precisely, "[a] contrived clinical study was performed." For brevity, I will just describe the results for NP swabs. (The data on BALs were the same.)

No. samplesConcentrationTest −Test +
500500
101×LoD010
102×LoD010
104×LoD010
108×LoD010

For these outcomes, the summary derives the following statistics:
  • "Positive Percent Agreement 40/40 = 100% (95% CI: 91.24% - 100%)"
  • "Negative Percent Agreement 50/50 = 100% (95% CI: 92.87% -100%)"
The first confidence interval is an estimate for the sensitivity, based the 40 positive test swabs pooled over the four geometrically decreasing concentrations. This interval, and the second one, for the specificity, suggest that the test is good at distinguishing between swabs spiked with between 6.25 and 50 cp/μL of the virus, on the one hand, and swabs with no SARS-CoV-2 at all, on the other.

But it is not clear what this sample of 90 tests is representative of. The efficacy of the test in discriminating between a virus-free swab and a virusy one depends on the how many viruses are on the swab. If we contrast the 10 swabs constructed to have the reported limit of detection (6.25) with the 50 with no viruses, the observed sensitivity in the experimental sample is still 1, but because 10 is a small sample size, the 95% Clopper-Pearson CI extends as low as 0.69. The lower end of the interval for the specificity is still 0.93. To discern some sort of average sensitivity and specificity for swabs from patients, one would need to know the distribution of viral concentrations in the patient population.

LabCorp's validity study contains further data on "Analytical Specificity." This is not the specificity for the classification for virus-present versus virus-absent seen in the "Clinical Evaluation." It concerns the possibility that a different virus could generate (false) positive results -- something that, as previously noted, would make the test even less specific. The summary lists bacteria and viruses that did not produce positive test results (in an unspecified number of tests). This is consistent with the fact that a number of them have "no homology with primers and probes of the COVID-19 RT-PCR test." In other words, their nucleic acid sequences are substantially different from the sequences of SARS-CoV-2 used in the test. As such, the SARS-CoV-2 amplication and detection process should not react to the sequences from at least this set of other organisms.

Reporting Test Results Without Quantitative Information

The advice from testing companies and laboratories does not even try to supply estimates of sensitivity and specificity -- either for the diagnosis of COVID-19 or the presence of SARS-CoV-2 on specimens. For example, the Fact Sheet for Healthcare Providers: Labcorp's COVID-19 RT-PRC Test - LabCorp (Mar. 16, 2020) contains the following questions and answers:
What does it mean if the specimen tests positive for the virus that causes COVID-19?
A positive test result for COVID-19 indicates that RNA from SARS-CoV-2 was detected, and the patient is infected with the virus and presumed to be contagious. Laboratory test results should always be considered in the context of clinical observations and epidemiological data ....
LabCorp's COVID-19 RT-PCR Test has been designed to minimize the likelihood of false positive test results. ...
What does it mean if the specimen tests negative for the virus that causes COVID-19?
A negative test result for this test means that SARS-CoV-2 RNA was not present in the specimen above the limit of detection. However, a negative result does not rule out COVID-19 and should not be used as the sole basis for treatment or patient management decisions. A negative result does not exclude the possibility of COVID-19.
When diagnostic testing is negative, the possibility of a false negative result should be considered in the context of a patient’s recent exposures and the presence of clinical signs and symptoms consistent with COVID-19. ...
This advice is oddly phrased and not terribly helpful. Among other things, \7/ does the statement that the test is designed to "minimize" the false-positive probability mean that the test maximizes the sensitivity to the point that its complement, the false-positive probability, is 0? That the test design makes the FPP higher than some alternative designs that were considered? Of course, the positive test "indicates" that the viral RNA is present, but how strong is the indication? \8/ And, how should the test result -- whether positive or negative -- be evaluated "in the context of clinical observations and epidemiological data"? The Fact Sheet leaves the healthcare providers for whom it is written at sea.

It is all but impossible to answer the last two questions without understanding Bayes' rule -- a formula for updating a previously established probability in the light of new information such as a symptom or a test result. Suffice it to say that the probability of COVID-19 in the patient is a function of (1) the prevalence of the disease among persons who are like the patient in their demographic and geographic characteristics and medical histories; (2) the sensitivity and specificity of the symptoms (things like a fever and a cough) in this population; and (3) the sensitivity and specificity of the test for SARS-CoV-2 in this population.

Today's Bottom Line

On the basis of the kind of information collected here, it is safe to say that a positive test result raises the odds of COVID-19 and a negative result lowers them. But by how much? To make better use of the tests in diagnosing COVID-19, their operating characteristics should be measured by validating the tests against specimens from patients who are known to be suffering from COVID-19.

The Wall Street Journal has an alarming statistic for the false negative rate. Its answer to the question "Are tests accurate?" is
  • Health experts say they now believe nearly one in three patients who are infected are nevertheless getting a negative test result. They caution that only limited data are available, and their estimates are based on their own experience in the absence of hard science.
  • That picture is troubling, many doctors say, as it casts doubt on the reliability of a wave of new tests developed by manufacturers, lab companies and the CDC. Most of these are operating with minimal regulatory oversight and little time to do robust studies amid a desperate call for wider testing. \9/
A false-negative rate of 1/3 is the same as a sensitivity of 2/3s. (Proof: Let C stand for "has COVID-19." Then Pr(–|C) + Pr(–|not-C) = false negative rate + sensitivity = 1/3 + 2/3 = 1. This notation makes it clear that now we are speaking of conditional probabilities for the disease rather than the presence of the virus in the specimen at or above the LoD.)

A discussion in the Internet Book of Critical Care \10/ refers to one or two studies along these lines. It suggests that in practice, the sensitivity and specificity are each below 80%:
There are several major limitations, which make it hard to precisely quantify how RT-PCR performs.
  1. RT-PCR performed on nasal swabs depends on obtaining a sufficiently deep specimen. Poor technique will cause the PCR assay to under-perform.
  2. COVID-19 isn't a binary disease, but rather there is a spectrum of illness. Sicker patients with higher viral burden may be more likely to have a positive assay. Likewise, sampling early in the disease course may reveal a lower sensitivity than sampling later on.
  3. Most current studies lack a “gold standard” for COVID-19 diagnosis. For example, in patients with positive CT scan and negative RT-PCR, it's murky whether these patients truly have COVID-19 (is this a false-positive CT scan, or a false-negative RT-PCR?). ...
Specificity seems to be high (although contamination can cause false-positive results), [but] sensitivity may not be terrific. ... In a case series diagnosed on the basis of clinical criteria and CT scans, the sensitivity of RT-PCR was only ~70% (Kanne 2/28). Sensitivity varies depending on assumptions made about patients with conflicting data (e.g. between 66-80%) (Ai et al.). ... Among patients with suspected COVID-19 and a negative initial PCR, repeat PCR was positive in 15/64 patients (23%). This suggests a PCR sensitivity of <80%. Conversion from negative to positive PCR seemed to take a period of days, with CT scan often showing evidence of disease well before PCR positivity (Ai et al.).

Bottom line?
PCR seems to have a sensitivity somewhere on the order of ~75%. A single negative RT-PCR doesn't exclude COVID-19 (especially if obtained from a nasopharyngeal source or if taken relatively early in the disease course). If the RT-PCR is negative but suspicion for COVID-19 remains, then ongoing isolation and re-sampling several days later should be considered.

An 80% sensitivity and specificity implies that the test changes the odds of the disease by a factor of only 80/20 = 4. For such a test, if the physician's prior odds (those formed before receiving the test result) were, say, 6:1 in favor of COVID-19, a positive test result would change them to 24:1. The posterior probability is thus 24/25 = 96%. A negative test result would shift the odds from 1:6 for not-COVID-19 to 4:6. These latter odds are equivalent to 6:4 on COVID-19 (a probability of disease of 6/10 = 60%). In short, the starting probability of 6/7 = 87% went up to 96% or down to 60%, depending on whether the test came back positive or negative. If the starting odds were reversed -- 1:6 on COVID-19 prior to the test -- the posterior probabilities of the disease would be lower -- 40% with the positive test result, and only 4% with a negative test result.

By way of comparison, one study of the much maligned technique of microscopic hair comparisons for identity used mitochondrial DNA tests as the gold standard for accuracy. It gave rise to a likelihood ratio for a positive association between the crime-scene hair fibers and the suspects' head hairs of a little under 3. \11/ That is not impressive, but if the estimates proposed by the Critical Care doctors are correct about the tests for SARS-CoV-2, the probative value of a hair association is not all that different from the diagnostic value of a positive molecular diagnostics test.

UPDATE (8/28/20)
A clear discussion of test sensitivity, specificity, and positive predictive value can be found in the International Statistical Institute's blog posting by John Bailar, My COVID-19 Test Is Positive … Do I Really Have It?, Statisticians React to the News, Aug. 25, 2020. It focuses on interpreting rapid antigen screening test results in combination with confirmatory PCR tests of the kind discussed here but does not delve deeply into the estimated sensitivity and specificity of any of the tests. It proposes further further reading on this topic in Lauren Kucirka & Justin Lessler, COVID-19 Story Tip: Beware of False Negatives in Diagnostic Testing of COVID-19, Johns Hopkins Medicine Newsroom, May 26, 2020, ("describing work suggesting false negative rates > 20% for RT-PCR tests and that test accuracy changes over time course of disease"), and Rob Stein, Study Raises Questions About False Negatives From Quick Covid-19 Test, NPR Morning Edition, Apr. 21, 2020 (reporting that "[r]esearchers at the Cleveland Clinic tested 239 specimens known to contain the coronavirus using five of the most commonly used coronavirus tests, including the Abbott ID NOW [which] only detected the virus in 85.2% of the samples, meaning it had a false-negative rate of 14.8 percent.").

NOTES

  1. The names (and other information) on the tests that have received Emergency Use Authorization (EUA) from the FDA are listed at https://www.fda.gov/emergency-preparedness-and-response/mcm-legal-regulatory-and-policy-framework/emergency-use-authorization#2019-ncov.
  2. There also are serological tests. These look for antibodies in the blood. For a description of various types of tests, see Cormac Sheridan, Fast, Portable Tests Come Online to Curb Coronavirus Pandemic, Nature Biotechnology, Mar. 23, 2020.
  3. The FDA's Policy for Diagnostic Tests for Coronavirus Disease-2019 during the Public Health Emergency, Mar. 16, 2020, contains the following "recommendations regarding the minimum testing that should be performed to ensure analytical and clinical validity" for "tests that detect SARS-CoV-2 nucleic acids from human specimens" (pp. 9-10):
    (1) Limit of Detection

    FDA recommends that laboratories document the limit of detection (LoD) of their SARS-CoV-2 assay. FDA generally does not have concerns with spiking RNA or inactivated virus into artificial or real clinical matrix (e.g., Bronchoalveolar lavage [BAL] fluid, sputum, etc.) for LoD determination. FDA recommends that laboratories test a dilution series of three replicates per concentration, and then confirm the final concentration with 20 replicates. For this guidance, FDA defines LoD as the lowest concentration at which 19/20 replicates are positive. If multiple clinical matrices are intended for clinical testing, FDA recommends that laboratories submit in their EUA requests the results from the most challenging clinical matrix to FDA. For example, if testing respiratory specimens (e.g., sputum, BAL, nasopharyngeal (NP) swabs, etc.), laboratories should include only results from sputum in their EUA request.

    (2) Clinical Evaluation

    In the absence of known positive samples for testing, FDA recommends that laboratories confirm performance of their assay with a series of contrived clinical specimens by testing a minimum of 30 contrived reactive specimens and 30 non-reactive specimens. Contrived reactive specimens can be created by spiking RNA or inactivated virus into leftover clinical specimens, of which the majority can be leftover upper respiratory specimens such as NP swabs, or lower respiratory tract specimens such as sputum, etc. We recommend that twenty of the contrived clinical specimens be spiked at a concentration of 1x-2x LoD, with the remainder of specimens spanning the assay testing range. For this guidance, FDA defines the acceptance criteria for the performance as 95% agreement at 1x-2x LoD, and 100% agreement at all other concentrations and for negative specimens.

    (3) Inclusivity

    Laboratories should document the results of an in silico analysis indicating the percent identity matches against publicly available SARS-CoV-2 sequences that can be detected by the proposed molecular assay. FDA anticipates that 100% of published SARS-CoV-2 sequences will be detectable with the selected primers and probes.

    (4) Cross-reactivity

    At a minimum, FDA believes an in silico analysis of the assay primer and probes compared to common respiratory flora and other viral pathogens is sufficient for initial clinical use. For this guidance, FDA defines in silico cross-reactivity as greater than 80% homology between one of the primers/probes and any sequence present in the targeted microorganism. In addition, FDA recommends that laboratories follow recognized laboratory procedures in the context of the sample types intended for testing for any additional cross-reactivity testing.
  4. Id. at 4.
  5. Id. at 9 ("FDA recommends that laboratories test a dilution series of three replicates per concentration, and then confirm the final concentration with 20 replicates.").
  6. In twenty independent tests of swabs with a probability of 18/20 = 0.9 of a detection on each test, the binomial probability of exactly 19 detections is 20 × 0.919 × 0.1 = 0.27.
  7. I think the first sentence is intended to say that a positive test result indicates (with some probability) that the specific RNA is present in the specimen, which implies (with some probability) that the patient is infected with SARS-CoV-2. And, I suspect that the first sentence in the second answer should read, a "negative test result indicates that SARS-CoV-2 RNA was not present in the specimen above the limit of detection" or a "negative test result means that SARS-CoV-2 RNA was not detected in the specimen above the limit of detection."
  8. Despite the warning in the first box about clinical and epidemiologic information, LabCorp apparently regards a positive result on its test as "definitive" in and of itself. A Q&A page on the LabCorp's website does not even include a question about the meaning of a positive result. It only asks whether "a negative result from LabCorp’s testing for COVID-19 mean[s] that a patient is definitely not infected." Its answer is
    Not necessarily. LabCorp’s testing for COVID-19 detects the virus directly, within the established limits of detection for which it was validated. A positive result is considered definitive evidence of infection. However, a negative result does not definitively rule out infection. As with any test, the accuracy relies on many factors:
    • The test may not detect virus in an infected patient if the virus is not being actively shed at the time or site of sample collection.
    • The amount of time that an individual was exposed prior to the collection of the specimen can also influence whether the test will detect the virus.
    • Individual response to the virus can differ.
    • Whether the specimen we receive was collected properly, sent promptly, and packaged correctly. Test results are a critical part of any diagnosis, but must be used by the clinician along with other information to form a diagnosis.
    Q&A, LabCorp's Testing for COVID-19, Mar. 25, 2020, https://www.labcorp.com/assets-media/2330. At least the last bulleted item pertains to positive as well as negative results.
  9. WSJ Staff, Who Has Covid-19? What We Know About Tests for the New Coronavirus?, updated Apr. 2, 2020 7:46 pm ET, https://www.wsj.com/articles/who-has-covid-19-what-we-know-about-tests-for-the-new-coronavirus-11585868185
  10. Josh Farkas, COVID-19, Internet Book of Critical Care, Mar. 2, 2020 (updated Mar. 29, 2020), https://emcrit.org/ibcc/COVID19/.
  11. David H. Kaye, Ultracrepidarianism in Forensic Science: The Hair Evidence Debacle, 72 Wash. & Lee L. Rev. Online, 227–254 (2015) (discussing the study), available at ssrn.com/abstract=2647430.

2 comments:

  1. Thank you for this nice piece of work.
    May I ask where the "Ai et al." reference may be found? I did not see it in your references.

    ReplyDelete
    Replies
    1. Thanks. The article is at https://pubs.rsna.org/doi/full/10.1148/radiol.2020200642. The full citation is Ai T, Yang Z, Hou H, Zhan C, Chen C, Lv W, Tao Q, Sun Z, Xia L. Correlation of Chest CT and RT-PCR Testing in Coronavirus Disease 2019 (COVID-19) in China: A Report of 1014 Cases. Radiology 2020:200642. doi: 10.1148/radiol.2020200642. For more references on this point, see the consensus statement at https://pubs.rsna.org/doi/10.1148/radiol.2020201365. This group notes that "Early detection and containment of infection caused by the novel coronavirus SARS-CoV2 has been hindered by the need to develop, mass produce, and widely disseminate the required molecular diagnostic test, a real-time reverse transcriptase-polymerase chain reaction (RT-PCR) assay. Early reports of test performance in the Wuhan outbreak showed variable sensitivities ranging from 37% to 71% (4, 5). While laboratory-based performance evaluations of RT-PCR test show high analytical sensitivity and near-perfect specificity with no misidentification of other coronaviruses or common respiratory pathogens, test sensitivity in clinical practice may be adversely affected by a number of variables including: adequacy of specimen, specimen type, specimen handling, and stage of infection when the specimen is acquired (CDC guidelines for in-vitro diagnostics) (6, 7). False negative RT-PCR tests have been reported in patients with CT findings of COVID-19 who were eventually tested positive with serial sampling (8)."

      Delete