Sunday, September 24, 2017

How Experts (Mis)Represent Likelihood Ratios for DNA Evidence

Earlier this month, I noted the tendency of journalists to misconstrue a likelihood ratio as odds or probabilities in favor of a source hypothesis. I mentioned expressions such as "the likelihood that a suspect’s DNA is present in a mixture of substances found at a crime scene" and "the probability, weighed against coincidence, that sample X is a match with sample Y." In place of such garbled descriptions, I proposed that
Putting aside all other explanations for the overlap between the mixture and the suspect's alleles -- explanations like relatives or some laboratory errors--this likelihood ratio indicates how much the evidence changes the odds in favor of the suspect’s DNA being in the mixture. It quantifies the probative value of the evidence, not the probability that one or another explanation of the evidence is true.
The journalists' misstatements occurred in connection with likelihood ratios involving DNA mixtures, but even experts in forensic inference make the same mistake in simpler situations. The measure of probative value for single-source DNA is a more easily computed likelihood ratio (LR). Unfortunately, it is very easy to describe LRs in ways that invite misunderstanding. Below are two examples:
[I]n the simplest case of a complete, single-source evidence profile, the LR expression reverts to the reciprocal of the profile frequency. For example: Profile frequency = 1/1,000,000 [implies] LR = P(E|H1) / P(E|H2) = 1 / 1/1,000,000 = 1,000,000/1 = 1,000,000 (or 1 million). This could be expressed in words as, "Given the DNA profile found in the evidence, it is 1 million times more likely that it is from the suspect than from another random person with the same profile." -- Norah Rudin & Keith Inman, An Introduction to Forensic DNA Analysis 148-49 (2d ed. 2002).
Comment: If "another random person" had the "the same profile," there would be no genetic basis for distinguishing between this individual and the suspect. So how could the suspect possibly be a million times more likely to be the source?
A likelihood ratio is a ratio that compares the likelihood of two hypotheses in the light of data. [I]n the present case there are two hypotheses: the sperm came from twin A or the sperm came from twin B, and then you calculate the likelihood of each hypotheses in the face or in the light of the data, and then you form the ratio [LR] of the two. So the ratio tells you how much more likely one hypothesis is than the other in the light of the experimental data. --Testimony of Michael Krawczak in a pretrial hearing on a motion to exclude evidence in Commonwealth v. McNair, No. 8414CR10768 (Super. Ct., Suffolk Co., Mass.) (transcript, Feb. 15, 2017).
Comment: Defining "likelihood" as a quantity proportional to the probability of data given the hypothesis, the first sentence is correct. But this definition was not provided, and the second sentence further suggests that the "experimental data" makes one twin LR times more probable to be the source than the other. That conclusion is correct only if the prior odds are equal -- an assumption that does not rest on those data.
With this kind of prose and testimony, is it any surprise that courts write that "[t]he likelihood ratio 'compares the probability that the defendant was a contributor to the sample with the probability that he was not a contributor to the sample'”? Commonwealth v. Grinkley, 75 Mass.App.Ct. 798, 803, 917 N.E.2d 236, 241 (Mass. Ct. App. 2009) (quoting Commonwealth v. McNickles, 434 Mass. 839, 847, 753 N.E.2d 131 (2005))?

No comments:

Post a Comment