Wednesday, January 25, 2017

Statistics for Making Sense of Forensic Genetics

The European Forensic Genetics Network of Excellence (EUROFORGEN-NoE) is a group of “16 partners from 9 countries including leading groups in European forensic genetic research.” In 2016, it approached Sense About Science — “an independent charity that challenges misrepresentation of science and evidence in public life” — to prepare and disseminate a guide to DNA evidence. Within the year, the guide, entitled Making Sense of Forensic Genetics, emerged. The 40-page document has a long list of “contributors,” who, presumably, are its authors. According to EUROFORGEN-NoE, it is “designed to introduce professional and public audiences to the use of DNA in criminal investigations; to understand what DNA can and can’t tell us about a crime, and what the current and future uses of DNA analysis in the criminal justice system might be.”

By and large, it accomplishes this goal, offering well informed comments and cautions for the general public. Some of the remarks about probabilities and statistics, however, are not as well developed as they could be. The points worth noting have more to do with clarity of expression than with any outright errors.

Statistics do not arise in a vacuum. Proper interpretation requires some understanding of how they came to be produced. Thus, Making Sense correctly observes that:
DNA evidence has a number of limitations: it might be undetectable, overlooked, or found in such minute traces as to make interpretation difficult. Its analysis is subject to error and bias. Additionally, DNA profiles can be misinterpreted, and their importance exaggerated, as illustrated by the wrongful arrest of a British man, ... . Even if DNA is detected at a crime scene, this doesn’t establish guilt. Accordingly, DNA needs to be viewed within a framework of other evidence, rather than as a standalone answer to solving crimes.
With respect to the narrow question of whether two DNA samples originate from the same individual, Making Sense asks, “So what is the chance that your DNA will match that of someone else?” An ambiguity lurks in this question. Does it refer to probability of a matching profile somewhere in the population, or to the probability of a matching profile in  a single, randomly selected individual? Apparently, the authors have the latter question in mind, for Making Sense explains that
It depends on how many locations in the DNA (loci) you look at. If a forensic scientist looked at just one locus, the probability of this matching the same marker in another individual would be relatively high (between 1 in 20 and 1 in 100). ... Since European police forces today typically analyse STRs at 16 or more loci, the probability that two full DNA profiles match by chance is miniscule — in the region of 1 in 10 with 16 zeros after it (or 1 in 100 million billion). ... Although in the UK court, the statistics are always capped at 1 in a billion.
The 1-in-a-billion cap is not seen in the United States, where laboratories toss about estimates in the quintillionths, septillionths, and so on (and on). (Could this be an instance of “America First”?) The naive reader might be forgiven for thinking that when the probability of the same match to a randomly selected individual is far less than 1 in a billion, an analyst could conclude that the recovered DNA is either from the defendant or a close relative. But Making Sense rejects this thought, insisting that “DNA doesn’t give a simple ‘yes’ or ‘no’ answer.”

The explanation for its position is muddled. First, the report repeats that “with information available for all 16 markers, ... the risk of DNA retrieved from a crime scene matching someone unrelated to the true source is extremely low (less than 1 in a billion, and often many orders of magnitude lower than this).” So why is not this good enough for a “yes or no answer”? The hesitation, as expressed, is that
However, many of the DNA profiles retrieved from crime scenes aren’t full DNA profiles because they’re missing some genetic markers or there is a mixture of DNA from two or more people. So was it the suspect who left their DNA at the crime scene? The DNA evidence won’t give a ‘yes’ or ‘no’ answer: it can only ever be expressed in terms of probability.
But the conclusion that “it can only ever be expressed in terms of probability” is a non sequitur. The only thing that follows from the fact that not all crime-scene DNA samples lead to 16-locus profiles is that matches to the samples with less complete profiles are less convincing than matches to the samples with more complete profiles.

Of course, there is a sense in which all DNA evidence only gives rise to probabilities, and never to categorical conclusions. All empirical evidence only gives probable conclusions rather than certainties. Furthermore, it has been argued that forensic scientists should eschew source attributions because their expertise is limited to evaluating likelihoods — the probability of the match given that the sample came from a named individual and the probability given that it came from a different individual (or individuals). But that is not what Making Sense seems to be saying when declares yes-and-no answers impossible. The limits on all empirical knowledge and the role of an expert witness do not produce any line between 16-locus matches and less-than-16-locus matches.

Making Sense also points out that
[T]he match probability ... must not be confused (but often is) with how likely the person is to be innocent of the crime. For example, if a DNA profile from the crime scene matches the suspect’s DNA and the probability of such a match is 1 in 100 million if the DNA came from someone else, this does not mean that the chance of the suspect being innocent is 1 in 100 million. This serious misinterpretation is known as the prosecutor’s fallacy.
Conceptually, this transposition is a “serious misinterpretation,” but whether the correct inverse probability (one that is based on a prior probability and a Bayes factor on the order 100 million) gives a markedly different value is far from obvious. See David H. Kaye, The Interpretation of DNA Evidence: A Case Study in Probabilities, in Making Science-based Policy Decisions: Resources for the Education of Professional School Students, Nat'l Academies of Science, Engineering, and Medicine Committee on Preparing the Next Generation of Policy Makers for Science-Based Decisions ed., Washington, DC, 2016.

A reasonable approach is to have analysts present the two pertinent conditional probabilities mentioned above (the “likelihoods”) to explain how strongly the profiles support one hypothesis over the other. Making Sense refers to this approach in some detail, but it suggests that it is needed only “in more complex cases, such as mixtures of two or more individuals, or when there might be contamination by DNA in the environment.” Compared to the alternative ways to explain the implications of DNA and other trace evidence, however, the approach is more widely applicable.

No comments:

Post a Comment