Tuesday, May 5, 2020

How Do Forensic-science Tests Compare to Emergency COVID-19 Tests?

The Wall Street Journal recently reported that
At least 160 antibody tests for Covid-19 entered the U.S. market without previous FDA scrutiny on March 16, because the agency felt then that it was most important to get them to the public quickly. Accurate antibody testing is a potentially important tool for public-health officials assessing how extensively the coronavirus has swept through a region or state.
Now, the FDA will require test companies to submit an application for emergency-use authorization and require them to meet standards for accuracy. Tests will need to be found 90% “sensitive,” or able to detect coronavirus antibodies, and 95% “specific,” or able to avoid false positive results. \1/
How many test methods in forensic science have been shown to perform at or above these emergency levels? It is hard to say. For FDA-authorized tests, one can find the manufacturers' figures on the FDA's website, but for forensic-science tests, there is no such repository of information on the standards adopted by voluntary standards development organizations. The forensic-science test-method standards approved by consensus bodies such as the Academy Standards Board and ASTM Inc. rarely state the performance characteristics of these tests.

For the FDA's minimum operating characteristics of a yes-no test, the likelihood ratio for a positive result is Pr(+ | antibodies) / Pr(+ | no-antibodies) = 0.90/(1 − .95) = 18. The likelihood ratio for a negative result is Pr(− | no-antibodies) / Pr(− | antibodies) = .95/(1 − .90) = 9.5. In other words, a clean bill of health on a serological test with minimally acceptable performance would occur less than ten times as frequently for people with less than the detectable level of the virus as compared to people with detectable levels.

According to an Ad Hoc Working Group of the forensic Scientific Working Group on DNA Analysis Methods (SWGDAM), such a likelihood ratio may be described as providing "limited support." This description is near the lower end of a scale for likelihood ratios. These "verbal qualifiers" go from "uninformative" (L=1), to "limited" (2 to 99), "moderate" (100 to 999), "strong" (1,000 to 999,999), and, finally, "very strong" (1,000,000 or more). \2/

A more finely graded table appears "for illustration purposes" in an ENFSI [European Network of Forensic Science Institutes] Guideline for Evaluative Reporting in Forensic Science. The table classifies L = 9.5 as "weak support." \3/

  1. Thomas M. Burton, FDA Sets Standards for Coronavirus Antibody Tests in Crackdown on Fraud, Wall Street J., Updated May 4, 2020 8:24 pm ET, https://www.wsj.com/articles/fda-sets-standards-for-coronavirus-antibody-tests-in-crackdown-on-fraud-11588605373
  2. Recommendations of the SWGDAM Ad Hoc Working Group on Genotyping Results Reported as Likelihood Ratios, 2018, available via https://www.swgdam.org/publications.
  3. ENFSI Guideline for Evaluative Reporting in Forensic Science, 2016, p. 17, http://enfsi.eu/wp-content/uploads/2016/09/m1_guideline.pdf.

Saturday, April 25, 2020

Estimating Prevalence from Serological Tests for COVID-19 Infections: What We Don't Know Can Hurt Us

A statistical debate has emerged over the proportion of the population that has been infected with SARS-CoV-2. It is a crucial number in arguments about "herd immunity" and public health measures to control the COVID-19 pandemic. A news article in yesterday's issue of Science reports that
[S]urvey results, from Germany, the Netherlands, and several locations in the United States, find that anywhere from 2% to 30% of certain populations have already been infected with the virus. The numbers imply that confirmed COVID-19 cases are an even smaller fraction of the true number of people infected than many had estimated and that the vast majority of infections are mild. But many scientists question the accuracy of the antibody tests ... .\1/
The first sentence reflects a common assumption -- that the reported proportion of test results that are positive -- directly indicates the prevalence of infections where the tested people live. The last sentence gives one reason this might not be the case. But the fact that tests for antibodies are inaccurate does not necessarily preclude good estimates of the prevalence. It may still be possible to adjust the proportion up or down to arrive at the percentage "already ... infected with the virus." There is a clever and simple procedure for doing that -- under certain conditions. Before describing it, let's look another, more easily grasped threat to estimating prevalence -- "sampling bias."

Sampling Design: Who Gets Tested?

Inasmuch as the people tested in the recent studies are not based on random samples of any well defined population, the samples of test results may not be representative of what the outcome would be if the entire population of interest were tested. Several sources of bias in sampling have been noted.

A study of a German town "found antibodies to the virus in 14% of the 500 people tested. By comparing that number with the recorded deaths in the town, the study suggested the virus kills only 0.37% of the people infected. (The rate for seasonal influenza is about 0.1%.)" But the researchers "sampled entire households. That can lead to overestimating infections, because people living together often infect each other." \2/ Of course, one can count just one individual per household, so this clumping does not sound like a fundamental problem.

"A California serology study of 3300 people released last week in a preprint [found 50] antibody tests were positive—about 1.5%. [The number in the draft paper by Eran Bendavid, Bianca Mulaney, Neeraj Sood, et al. is 3330 \3/] But after adjusting the statistics to better reflect the county's demographics, the researchers concluded that between 2.49% and 4.16% of the county's residents had likely been infected." However, the Stanford researchers "recruit[ed] the residents of Santa Clara county through ads on Facebook," which could have "attracted people with COVID-19–like symptoms who wanted to be tested, boosting the apparent positive rate." \4/ This "unhealthy volunteer" bias is harder to correct with this study design.

"A small study in the Boston suburb of Chelsea has found the highest prevalence of antibodies so far. Prompted by the striking number of COVID-19 patients from Chelsea colleagues had seen, Massachusetts General Hospital pathologists ... collected blood samples from 200 passersby on a street corner. ... Sixty-three were positive—31.5%." As the pathologists acknowledged, pedestrians on a single corner "aren't a representative sample." \5/

Even efforts to find subjects at random will fall short of the mark because of self-selection on the part of subjects. "Unhealthy volunteer" bias is a threat even in studies like one planned for Miami-Dade County that will use random-digit dialing to utility customers to recruit subjects. \6/

In sum, sampling bias could be a significant problem in many of these studies. But it is something epidemiologists always face, and enough quick and dirty surveys (with different possible sources of sampling bias) could give a usable indication of what better designed studies would reveal.

Measurement Error: No Gold Standard

A second criticism holds that because the "specificity" of the serological tests could be low, the estimates of prevalence are exaggerated. "Specificity" refers the extent to which the test (correctly) does not signal and infection when applied to an uninfected individual. If it (incorrectly) signals an infection for these individuals, it causes false positives. Low specificity means lots of false positives. Worries over specificity recur throughout the Science article's summary of the controversy:
  • "The result carries several large caveats. The team used a test whose maker, BioMedomics, says it has a specificity of only about 90%, though Iafrate says MGH's own validation tests found a specificity of higher than 99.5%."
  • "Because the absolute numbers of positive tests were so small, false positives may have been nearly as common as real infections."
  • "Streeck and his colleagues claimed the commercial antibody test they used has 'more than 99% specificity,' but a Danish group found the test produced three false positives in a sample of 82 controls, for a specificity of only 96%. That means that in the Heinsberg sample of 500, the test could have produced more than a dozen false positives out of roughly 70 the team found." \7/
Likewise, political scientist and statistician Andrew Gelman blogged that no screening test that lacks a very high specificity can produce a usable estimate of population prevalence -- at least when the proportion of tests that are positive is small. This limitation, he insisted, is "the big one." \8/ He presented the following as a devastating criticism of the Santa Clara study (with my emphasis added):
Bendavid et al. estimate that the sensitivity of the test is somewhere between 84% and 97% and that the specificity is somewhere between 90% and 100%. I can never remember which is sensitivity and which is specificity, so I looked it up on wikipedia ... OK, here are [sic] concern is actual negatives who are misclassified, so what’s relevant is the specificity. That’s the number between 90% and 100%.
If the specificity is 90%, we’re sunk.
With a 90% specificity, you’d expect to see 333 positive tests out of 3330, even if nobody had the antibodies at all. Indeed, they only saw 50 positives, that is, 1.5%, so we can be pretty sure that the specificity is at least 98.5%. If the specificity were 98.5%, the observed data would be consistent with zero ... . On the other hand, if the specificity were 100%, then we could take the result at face value.
So how do they get their estimates? Again, the key number here is the specificity. Here’s exactly what they say regarding specificity:
A sample of 30 pre-COVID samples from hip surgery patients were also tested, and all 30 were negative. . . . The manufacturer’s test characteristics relied on . . . pre-COVID sera for negative gold standard . . . Among 371 pre-COVID samples, 369 were negative.
This gives two estimates of specificity: 30/30 = 100% and 369/371 = 99.46%. Or you can combine them together to get 399/401 = 99.50%. If you really trust these numbers, you’re cool: with y=399 and n=401, we can do the standard Agresti-Coull 95% interval based on y+2 and n+4, which comes to [98.0%, 100%]. If you go to the lower bound of that interval, you start to get in trouble: remember that if the specificity is less than 98.5%, you’ll expect to see more than 1.5% positive tests in the data no matter what!
To be sure, the fact that the serological tests are not perfectly accurate in detecting an immune response makes it dangerous to rely on the proportion of people tested who test positive as the measure of the proportion of the population who have been infected. Unless the test is perfectly sensitive (is certain to be positive for an infected person) and specific (certain to be negative for an uninfected person), the observed proportion will not be the true proportion of past infections -- even in the sample. As we will see shortly, however, there is a simple way to correct for imperfect sensitivity and specificity in estimating the population prevalence, and there is a voluminous literature on using imperfect screening tests to estimate population prevalence. \9/ Recognizing what one wants to estimate leads quickly to the conclusion that the usual media reports of the raw proportion of positives among the tested group (even with a margin of error to account for sampling variability) is not generally the right statistic to focus on.

Moreover, the notion that because false positives inflate an estimate of the number who have been infected, only the specificity is relevant is misconceived. Sure, false positives (imperfect specificity) inflate the estimate. But false negatives (imperfect sensitivity) simultaneously deflate it. Both types of misclassifications should be considered.

How, then, do epidemiologists doing surveillance studies normally handle the fact that the tests for a disease are not perfectly accurate? Let's use p to denote the positive proportion in the sample of people tested -- for example, the 1.5% in the Santa Clara sample or the 21% figure for New York City that Governor Andrew Cuomo announced in a tweet. The performance of the serological test depends on its true sensitivity SEN and true specificity SPE. For the moment, let's assume that these are known parameters of the test. In reality, they are estimated from separate studies that themselves have sampling errors, but we'll just try out some values for them. First, let's derive a general result that contains ideas presented in 1954 in the legal context of serological tests for parentage. \10/

Let PRE designate the true prevalence in the population (such as everyone in Santa Clara county or New York City) from which a sample of people to be tested is drawn. We pick a person totally at random. That person either has harbored the virus (inf) or not (uninf). The former probability we abbreviate as Pr(inf); the latter is Pr(uninf). The probability that the individual tests positive is
  Pr(test+) = Pr[test+ & (inf or uninf)]
     = Pr[(test+ & inf) or (test+ & uninf)]
     = Pr(test+ & inf) + Pr(test+ & uninf)
     = Pr(test+ | inf)Pr(inf) + Pr(test+ | uninf)Pr(uninf)     (1)*
In words, the probability of the positive result is (a) the probability the test is positive if the person has been infected, weighted by the probability he or she has been infected, plus (b) the probability it is positive if the person has not been infected, weighted by the probability of no infection.

We can rewrite (1) in terms of the sensitivity and specifity. SEN is Pr(test+|inf) -- the probability of a positive result if the person has been infected. SPE is Pr(test–|uninf) -- the probability of a negative result if the person has not been infected. For the random person, the probability of infection is just the true prevalence in the population, PRE. So the first product in (1) is simply SEN × PRE.

To put SPE into the second term, we note that the probability that an event happens is 1 minus the probability that it does not happen. Consequently, we can write the second term as (1 – SPE) × (1 – PRE). Thus, we have
     Pr(test+) = SEN PRE + (1 – SPE)(1 – PRE)           (2)
Suppose, for example, that SEN = 70%, SPE = 80%, and PRE = 10%. Then Pr(test+) = 1/5 + PRE/2 = 0.25. The expected proportion of observed positives in a random sample would be 0.25 -- a substantial overestimate of the true prevalence PRE = 0.10.

In this example, with rather poor sensitivity, using the observed proportion p of positives in a large random sample to estimate the prevalence PRE would be foolish. So we should not blithely substitute p for PRE. Indeed, doing so can give us a bad estimate even when the test has perfect specificity. When SPE = 1, Equation (2) reduces to Pr(test+) = SEN PRE. In this situation, the sample proportion does not estimate the prevalence -- it estimates only a fraction of it.

Clearly, good sensitivity is not a sufficient condition for using the sample proportion p to estimate the true prevalence PRE, even in huge samples. Both SEN and SPE cause misclassifications, and they work in opposite directions. Poor specificity leads to false positives, but poor sensitivity leads to true positives being counted as negatives. The net effect of these opposing forces is mediated by the prevalence.

To correct for the expected misclassifications in a large random sample, we can use the observed proportion of positives, not as estimator of the prevalence, but as an estimator of Pr(test+). Setting p = Pr(test +), we solve for PRE to obtain an estimated prevalence of
      pre = (p + SPE – 1)/(SPE + SEN – 1)         (3) \11/
For the Santa Clara study, Bendavid et al. found p = 50/3330 = 1.5%, and suggested that SEN = 80.3% and SPE = 99.5%. \12/ For these values, the estimated prevalence is pre = 1.25%. If we change SPE to 98.5%, where Gelman wrote that "you get into trouble," the estimate is pre = 0, which is clearly too small. Instead, the researchers used equation (3) only after they transformed their stratified sample data to fit the demographics of the county. That adjustment produced an inferred proportion p' = 2.81%.  Using that adjusted value for p, Equation (3) becomes
      pre = (p' + SPE – 1)/(SEN + SPE – 1)         (4)
For the SPE of 98.5%, equation (4) gives an estimated prevalence of pre = 1.66%. For 99.5% it is 2.9%. Although some critics have complained about using Equation (3) with the demographically adjusted proportion p' shown in (4), if the adjustment provides a better picture of the full population, it seems like the right proportion to use for arriving at the point estimate pre.

Nevertheless, there remains a sense in which the sensitivity is key. Given SEN = 80.3%, dropping SPE to 97.2% gives pre = 0. Ouch! When SPE drops below 97.2%, pre turns negative, which is ridiculous. In fact, this result holds for many other values of SEN. So one does need a high sensitivity for Equation (3) to be plausible -- at least when the true prevalence (and hence p') is small. But as PRE (and thus p') grow larger, Equations (3) and (4) look better. For example, if p = 20%, then pre is 22% even with SPE = 97.2% and SEN = 80.3%. Indeed, with this large a p even with a specificity of only SPE = 90% we still get a substantial pre = 14.2%.

Random Sampling Error

I have pretended the sensitivity and specificity are known with certainty.  Equation (3) only gives a point estimate for true prevalence. It does not account for sampling variability -- either in p (and hence p') or in the estimates (sen and spe) of SEN and SPE, respectively, that have to be plugged into (3). To be clear that we are using estimates from the separate validity studies rather than the unknown true values for SEN and SPE, we can write the relevant equation as follows:
      pre = (p + spe – 1)/(sen + spe – 1)         (5)
Dealing with the variance of p (or p') with sample sizes like 3300 is not hard. Free programs on the web give confidence intervals based on various methods for arriving at the standard error for pre considering the size of the random sample that produced the estimate p. (Try it out.)

Our uncertainty about SEN and SPE is greater (at this point, because the tests rushed into use have not been well validated, as discussed in previous postings). Bendavid et al. report a confidence interval for PRE that is said to account for the variances in all three estimators -- p, sen, and spe. \13/ However, a savage report in Ars Technica \14/ collects tweets such as a series complaining that "[t]he confidence interval calculation in their preprint made demonstrable math errors." \15/ Nonetheless, it should be feasible to estimate the contribution that sampling error in the validity studies for the serological tests contributes to the uncertainty in pre as an estimator of the population prevalence PRE. The researchers, at any rate, are convinced that "[t]he argument that the test is not specific enough to detect real positives is deeply flawed." \16/ Although they are working with a relatively low estimated prevalence, they could be right. \17/ If sensitivity is in the range they claim, their estimates of prevalence should not be dismissed out of hand.

* * *

The take away message is that a gold standard serological test is not always necessary for effective disease surveillance. It is true that unless the test is highly accurate, the positive test proportion p (or a proportion p' adjusted for a stratified sample) is not a good estimator of the true prevalence PRE. That has been known for quite some time and is not in dispute. At the same time, pre sometimes can be a useful estimator of true prevalence. That too is not in dispute. Of course, as always, good data are better than post hoc corrections, but for larger prevalences, serological tests may not require 99.5% specificty to produce useful estimates of how many people have been infected by SARs-CoV-2.

UPDATE: 5/9/20: An Oregon State University team in Corvallis is going door to door in an effort to test a representative sample of the college town's population. \1/ A preliminary report released to the media reports a simple incidence of 2/1,000. Inasmuch the sketchy accounts indicate that the samples collected are nasal swabs, the proportion cannot be directly compared to the proportion positive for serological tests mentioned above. The nasal swabbing is done by the respondents in the survey rather than by medical personnel, \2/ and the results pertain to the presence of the virus at the time of the swabbing rather than to an immune response that may be the result of exposure in the past.

  1. Gretchen Vogel, Antibody Surveys Suggesting Vast Undercount of Coronavirus Infections May Be Unreliable, Science, 368:350-351, Apr. 24, 2020, DOI:10.1126/science.368.6489.350, doi:10.1126/science.abc3831
  2. Id.
  3. Eran Bendavid, Bianca Mulaney, Neeraj Sood et al.,  COVID-19 Antibody Seroprevalence in Santa Clara County, California. medRxiv preprint dated Apr. 11, 2020,
  4. Id.
  5. Id.
  6. University of Miami Health System, Sylvester Researchers Collaborate with County to Provide Important COVID-19 Answers, Apr. 25, 2020, http://med.miami.edu/news/sylvester-researchers-collaborate-with-county-to-provide-important-covid-19
  7. Vogel, supra note 1.
  8. Andrew Gelman, Concerns with that Stanford Study of Coronavirus Prevalence, posted 19 April 2020, 9:14 am, on Statistical Modeling, Causal Inference, and Social Science, https://statmodeling.stat.columbia.edu/2020/04/19/fatal-flaws-in-stanford-study-of-coronavirus-prevalence/
  9. E.g., Joseph Gastwirth, The Statistical Precision of Medical Screening Procedures: Application to Polygraph and AIDS Antibodies Test Data, Stat. Sci. 1987, 2:213-222; D. J. Hand, Screening vs. Prevalence Estimation, Appl. Stat., 1987, 38:1-7; Fraser I. Lewis & Paul R. Torgerson, 2012, A Tutorial in Estimating the Prevalence of Disease in Humans and Animals in the Absence of a Gold Standard Diagnostic Emerging Themes in Epidemiology, 9:9, https://ete-online.biomedcentral.com/articles/10.1186/1742-7622-9-9; Walter J. Rogan & Beth Gladen, Estimating Prevalence from Results of a Screening-test. Am J Epidemiol. 1978, 107: 71-76; Niko Speybroeck, Brecht Devleesschauwer, Lawrence Joseph & Dirk Berkvens, Misclassification Errors in Prevalence Estimation: Bayesian Handling with Care, Int J Public Health, 2012, DOI:10.1007/s00038-012-0439-9
  10. H. Steinhaus, 1954, The Establishment of Paternity, Pr. Wroclawskiego Tow. Naukowego ser. A, no. 32. (discussed in Michael O. Finkelstein and William B. Fairley, A Bayesian Approach to Identification Evidence. Harvard Law Rev., 1970, 83:490-517). For a related discussion, see David H. Kaye, The Prevalence of Paternity in "One-Man" Cases of Disputed Parentage, Am. J. Human Genetics, 1988, 42:898-900 (letter).
  11. This expression is known as "the Rogan–Gladen adjusted estimator of 'true' prevalence" (Speybroeck et al., supra note 9) or "the classic Rogan-Gladen estimator of true prevalence in the presence of an imperfect diagnostic test." Lewis & Torgerson, supra note 9. The reference is to Rogan & Gladen, supra note 9.
  12. They call the proportion p = 1.5% the "unadjusted" estimate of prevalence.
  13. Some older discussions of the standard error in this situation can be found in Gastwirth, supra note 9; Hand, supra note 9. See also J. Reiczigel, J. Földi, & L. Ózsvári, Exact Confidence Limits for Prevalence of a Disease with an Imperfect Diagnostic Test, Epidemiology and Infection, 2010, 138:1674-1678.
  14. Beth Mole, Bloody math — Experts Demolish Studies Suggesting COVID-19 Is No Worse than Flu: Authors of widely publicized antibody studies “owe us all an apology,” one expert says, Ars Technica, Apr. 24, 2020, 1:33 PM, https://arstechnica.com/science/2020/04/experts-demolish-studies-suggesting-covid-19-is-no-worse-than-flu/
  15. https://twitter.com/wfithian/status/1252692357788479488 
  16. Vogel, supra note 1.
  17. A Bayesian analysis might help. See, e.g., Speybroeck et al., supra note 10.
UPDATED Apr. 27, 2020, to correct a typo in line (2) of the derivation of Equation (1), as pointed out by Geoff Morrison.

NOTES to later updates
  1. OSU Newsroom, TRACE First Week’s Results Suggest Two People per 1,000 in Corvallis Were Infected with SARS-CoV-2, May 7, 2020, https://today.oregonstate.edu/news/trace-first-week%E2%80%99s-results-suggest-two-people-1000-corvallis-were-infected-sars-cov-2
  2. But "[t]he tests used in TRACE-COVID-19 collect material from the entrance of the nose and are more comfortable and less invasive than the tests that collect secretions from the throat and the back of the nose." Id.

Thursday, April 23, 2020

More on False Positive and False Negative Serological Tests for COVID-19

An earlier posting looked at sensitivity and specificity of the first FDA-allowed emergency serological test for antibodies to SARS-CoV-2. It then identified some implications for getting people back to work through what a recent article in Nature called an "immunity passport." \1/

The news article cautions that "[k]its have flooded the market, but most aren’t accurate enough to confirm whether an individual has been exposed to the virus." The kits use components of the virus that the antibodies latch onto (the antigens) to detect the antibodies in the blood. Blood samples can be sent to a qualified laboratory for testing. In addition, "[s]everal companies ... offer point-of-care kits, which are designed to be used by health professionals to check if an individual has had the virus." In fact, "some companies market them for people to use at home." But
most kits have not undergone rigorous testing to ensure they’re reliable, says Michael Busch, director of the Vitalant Research Institute in San Francisco]. During a meeting at the UK Parliament’s House of Commons Science and Technology Select Committee on 8 April, Kathy Hall, the director of the testing strategy for COVID-19, said that no country appeared to have a validated antibody test that can accurately determine whether an individual has had COVID-19. ... [S]o far, most test assessments have involved only some tens of individuals because they have been developed quickly. ... [S]ome commercial antibody tests have recorded specificities as low as 40% early in the infection. In an analysis of 9 commercial tests available in Denmark, 3 lab-based tests had sensitivities ranging 67–93% and specificities of 93–100%. In the same study, five out of six point-of-care tests had sensitivities ranging 80–93%, and 80-100% specificity, but some kits were tested on fewer than 30 people. Testing was suspended for one kit.

Point-of-care tests are even less reliable than tests being used in labs, adds [David Smith, a clinical virologist at the University of Western Australia in Perth]. This is because they use a smaller sample of blood — typically from a finger prick — and are conducted in a less controlled environment than a lab .... The WHO recommends that point-of-care tests only be used for research.
False positives arise when a test uses an antigen that reacts with antibodies for pathogens other than SARS-CoV-2. In other words, the test is not 100% specific to the one type of virus. "An analysis of EUROIMMUN’s antibody test found that although it detected SARS-CoV-2 antibodies in three people with COVID-19, it returned a positive result for two people with another coronavirus." It is notable that "[i]t took several years to develop antibody tests for HIV with more than 99% specificity."

A further problem with issuing an "immunity passport" on the basis of a serologcal test is that the test may not detect the kind of antibodies that confer immunity to subsequent infection. It is not clear whether all people who have had COVID-19 develop the necessary "neutralizing" antibodies. An unpublished analysis of 175 people in China who had recovered from COVID-19 and had mild symptoms reported that 10 individuals produced no detectable neutralizing antibodies — even though some had high levels of binding antibodies. These people may lack protective immunity. Moreover, one study showed that viral RNA load declines slowly after antibodies are detected in the blood. Consequently, there could be a period in which a recovered patient is still shedding infectious virus.

A news article in this week's Science magazine also contains information on using serologic test data to estimate the proportion of people who have been infected (prevalence). \2/ It described a German study in which "Streeck and his colleagues claimed the commercial antibody test they used has “more than 99% specificity,” but a Danish group found the test produced three false positives in a sample of 82 controls, for a specificity of only 96%."

The article also mentions a survey in which "Massachusetts General Hospital pathologists John Iafrate and Vivek Naranbhai ... collected blood samples from 200 passersby on a street corner [and] used a test whose maker, BioMedomics, says it has a specificity of only about 90%, though Iafrate says MGH’s own validation tests found a specificity of higher than 99.5%."

  1. Smriti Mallapaty, Will Antibody Tests for the Coronavirus Really Change Everything?, Nature, Apr. 18, 2020, doi:10.1038/d41586-020-01115-z
  2. Gretchen Vogel, Antibody Surveys Suggesting Vast Undercount of Coronavirus Infections May Be Unreliable, Science, 368:350-351, Apr. 24, 2020, DOI:10.1126/science.368.6489.350, doi:10.1126/science.abc3831

Wednesday, April 22, 2020

Forensic Magazine Branches Out

Forensic Magazine is "powered by Labcompare, the Buyer's Guide for Laboratory Professionals." Its slogan is "On the Scene and in the Lab." Today's newsletter includes the following item, sandwiched between an article on DNA cold cases in Florida and domestic abuse in Nicaragua:
Texas State Forensic Association Names Educator of the Year
Wednesday, April 22, 2020

Julie Welker, chair of Howard Payne University’s Department of Communication and coach of HPU’s speech and debate team, was recently named the Texas Intercollegiate Forensics Association (TIFA) Educator of the Year. ... Welker, in her twenty-second year on the faculty at HPU, has been coaching the speech and debate team since 2005. ... [read the full story]
As a former high school and college debater myself, I applaud Professor Welker's coaching, but the newsletter brings to mind a discussion of the terms "forensic evidence" and "forensics" at a meeting of the National Commission on Forensic Science. A commission member, herself a university chemist, urged the commission to eschew these terms because of the speech and debate connection. At the time, I thought she was being picky. Now I am not so sure. By the way, the adjective "forensic" comes from the Latin word forensis, meaning "of the forum" or "public."

Tuesday, April 21, 2020

Comparing Countries by Cases of COVID-19: America First?

The following map appeared in the Los Angeles Times newsletter, Coronavirus Today, on April 20, with the caption, Where Is the Coronavirus Spreading?

Confirmed COVID-19 cases by country as of 5:00 p.m. Monday, April 20, 2020.
It does not take a Ph.D. in statistics to see that the total number of cases since the outbreak of the disease is not a measure of where the virus known as SARS-CoV-2 is currently spreading. It is an indirect measure of where it has been up to a given time. To know where the virus is spreading, we should look at new cases of the disease (incidence rates).

Even as a measure of cumulative cases, shading entire countries is misleading. If we want to see where cases have clustered since the recordkeeping started,we could look at the heights of bars placed on top of small, similarly sized geographic regions, where the heights are proportional to the number of cases in each region. Alaska would not stand out in such a graph. Indeed, the Times newsletter has a link to far better infographics from Johns Hopkins University, one of which clearly shows this fact.

Media reporting on only the total number of cases by country promotes false impressions of the incidence and prevalence of the disease across countries. For example, the table below gives approximate numbers for the US and Spain, which have the greatest cumulative numbers of cases as reported today on the Johns Hopkins website. It also includes China, which is ranked 9th in reported cases, and Switzerland (15th).

PopulationRelative frequency
(cases per 100,000)
United States788,000328,000,000240

Spain has only about a quarter of the number of cases reported in the US, but it has almost twice the prevalence of the disease. On the basis of reported cases and population size, China's population has emerged relatively unscathed, and on this scale, the US is by no means the most ravaged -- so far.