Monday, May 2, 2011

“A Copulation of Many Years of Testifying”: Misconstruing Statistical Significance in Forensic Toxicology

No matter how many times statistics books caution readers not to transpose p-values, scientists do it anyway. My latest example comes from the Forensic Toxicology Expert Witness Handbook (2007), by James W. Jones.

The introduction explains that the handbook is "to help facilitate the training of FTDTL [Forensic Toxicology Drug Testing Laboratory] scientists in forensic toxicology expert testimony" and that "[t]he information presented is a copulation [sic] of many years of testifying as an expert witness reading and researching information and listening to many experts ... .” (Page 2).

What wisdom resides in this compilation? At page 11, we learn that
Statistically, is there a scientifically-accepted likelihood that an observed relationship is simply not due to chance? That is where the 95% confidence number comes from. A “p” value of 0.05, by convention the cutoff between statistically significant and not, is that 95% likelihood. But that percentage applies only to one of the quality criteria as to whether the science used to assess causality in a claim.

[I]f a study showed only a 51% likelihood of reflecting a true relationship rather than a chance relationship, then no scientist, no regulatory body, no one who reviews scientific data, would consider that study indicative of any causal relationship. A 51% outcome would not even merit a follow up or “validation” study by the scientific community. In the words of legalese: “The relevant scientific community would consider the use of such a study methodologically improper.”

This is gobbledygook. I think Dr. Jones is saying that p < 0.05 implies that the probability of the alternative hypothesis (given the data) exceeds 95%. But statistical significance at the 0.05 level means nothing of the kind. It only means that if the null hypothesis is true, then the probability of the data (or other data even farther from what would be expected under the null hypothesis) is less than 0.05. The p-value assumes that the null hypothesis is true: p = 0.05 means that if the null hypothesis is true, then the data are so far from what is expected that one would encounter such data only 5% of the time (or less). Because the probability 0.05 pertains to the data, and not to the hypothesis, it makes no sense to compare a p-value or its complement to the “likelihood of ... a true relationship.”

Testing for statistical significance is a way to decide whether the result merits further investigation or should be tentatively accepted as proving that the null hypothesis is false. But an alternative hypothesis could have a 51% probability of being true (if one is willing to assign probabilities to hypotheses rather than to random variables) when the p-value is, say, 0.001. According to many reviewers of scientific data, this would be a highly significant result, strongly “indicative of [a] causal relationship” (if the data came from a controlled experiment properly designed to investigate causation). In technical terms, Dr. Jones has confused a conditional probability of data given the hypothesis, P(extreme data | H0) -- the p-value -- with a posterior probability for a hypothesis, P(H0 | extreme data).

These observations about p-values and significance testing are compact. For a gentler explanation, see David H. Kaye, David E. Bernstein & Jennifer L. Mnookin, The New Wigmore on Evidence: Expert Evidence (2d ed. 2011). See also David H. Kaye, Statistical Significance and the Burden of Persuasion, 46 J. L. & Contemp. Probs. 13 (1983); David H. Kaye, Is Proof of Statistical Significance Relevant?, 61 Wash. L. Rev. 1333 (1986); David H. Kaye & David A. Freedman, Reference Guide on Statistics, in Reference Manual on Scientific Evidence (3d ed. 2011).

No comments:

Post a Comment