Friday, September 23, 2011

"The first experimental study exploring DNA interpretation"

A recent study, entitled “Subjectivity and Bias in Forensic DNA Mixture Interpretation,” proudly presents itself as “the first experimental study exploring DNA interpretation.” The researchers are to be commended for seeking to examine, on at least a limited basis, the variations in the judgments of analysts about a complex mixture of DNA.

In addition to documenting such variation, they suggest that their experiment shows that motivational or contextual bias caused analysts in an unnamed case in Georgia to include one suspect in a rape case as a possible contributor to the DNA mixture. This claim merits scrutiny. The experiment is not properly designed to investigate causation, and the investigators' causal inference lacks the foundation of a controlled experiment. To put it unkindly, if an experiment is an intervention that at least attempts to control for potentially confounding variables so as to permit a secure inference of causation, then this is no experiment.

In the study, Itiel Dror, a cognitive neuroscientist and Honorary Research Associate at University College London, and Greg Hampikian, a Professor of Biology and Criminal Justice at Boise State University, presented electropherograms to 17 “expert DNA analysts ... in an accredited government laboratory in North America.” The electropherograms came from a complex mixture of DNA from at least four or five people recovered in a gang rape in Georgia. The article does not state how many analysts worked on the case, whether they worked together or separately, the exact information that they received, or whether they peeked at the suspects’ profiles before determining the alleles that were present in the mixture. They imply that the analysts were told that unless they could corroborate the accusations, no prosecution could succeed. Reasonably enough, Dror and Hampikian postulate that such information could bias an individual performing a highly subjective task.

In the actual case, one man pled guilty and accused three others of participating. The three men denied the accusation. The Georgia laboratory found that one of the three could not be excluded. Contrary to the expectations or desires of the police, the analysts either excluded the other two suspects or were unable to reach a conclusion as to them.

The 17 independent analysts shown the electropherograms from the case split on the interpretation of the complex mixture data. The study does not state the analysts’ conclusions for suspects 1 and 2. Presumably, they were consistent with one another and with the Georgia laboratory’s findings. With regard to suspect 3, however, “One examiner concluded that the suspect ‘cannot be excluded’, 4 examiners concluded ‘inconclusive’, and 12 examiners concluded ‘exclude.’”

From these outcomes, the researchers draw two main conclusions. The first is that “even using the ‘gold standard’ DNA, different examiners reach conflicting conclusions based on identical evidentiary data.”

That complex mixture analysis is unreliable (in the technical sense of being subject to considerable inter-examiner variation) is not news to forensic scientists and lawyers. Although the article implies that the NRC report on forensic science presents all DNA analysis as highly objective, the report refers to “interpretational ambiguities,” “the chance of misinterpretation,” and “inexperience in interpreting mixtures” as potential problems (NRC Report 2009, 132). The Federal Judicial Center’s Reference Manual on Scientific Evidence (Kaye & Sensabaugh 2011) explains that “A good deal of judgment can go into the determination of which peaks are real, which are artifacts, which are ‘masked,’ and which are absent for some other reason.” In The Double Helix and the Law of Evidence (2010, 208), I wrote that “As concurrently conducted ... , most mixture analyses involving partial or ambiguous profiles entail considerable subjectivity.” In 2003, Bill Thompson and his colleagues emphasized the risk of misinterpretation in an article for the defense bar.

These concerns about ambiguity and subjectivity have not escaped the attention of the courts. Supreme Court Justice Samuel Alito, quoting a law review article and a book for litigators, wrote that
[F]orensic samples often constitute a mixture of multiple persons, such that it is not clear whose profile is whose, or even how many profiles are in the sample at all. All of these factors make DNA testing in the forensic context far more subjective than simply reporting test results … .
and that
STR analyses are plagued by issues of suboptimal samples, equipment malfunctions and human error, just as any other type of forensic DNA test.
District Attorney’s Office for Third Judicial Dist. v. Osborne, 557 U.S. __ (2009) (Alito, J., concurring). Dror and Hampikian even quote DNA expert Peter Gill as saying that “If you show 10 colleagues a mixture, you will probably end up with 10 different answers.” Learning that 17 examiners were unanimous as to the presence of two profiles in a complex mixtures and that they disagreed as to a third supports the widespread recognition that complex mixtures are open to interpretation, and it adds some more information about just how frequently analysts might differ in evaluating one set of electropherograms.

The second conclusion that the authors draw is that in the Georgia case “the extraneous context appears to have influenced the interpretation of the DNA mixture.” This conclusion may well be true; however, it is all but impossible to draw on the basis of this “first experimental study studying DNA interpretation.” As noted at the outset, the “experimental study” has no treatment group. The study resembles collaborative exercises in DNA interpretation that have been done over the years. A true experiment—or at least a controlled one—would have included some analysts exposed to potentially biasing extraneous information. Their decisions could have been compared to those of the unexposed analysts.

Instead of controlling for confounding variables, the researchers compare the outcomes in their survey of analysts’ performance on an abstract exercise to the outcomes for one or two analysts in the original case. This approach does not permit them to exclude even the most obvious rival hypotheses. Perhaps it was not information about the police theory of the case and the prosecution's needs, but a difference in the labs' protocols that caused the difference. Perhaps the examiners outside of Georgia, who knew they were being studied, were more cautious in their judgments. Or, perhaps the police pressure, desires, or expectations really did have the hypothesized effect in Georgia. The study cannot distinguish among these and other possibilities.

In addition, the difference in outcomes between the Georgia group and the subjects in the study seems to be within the range of unbiased inter-examiner variability. How can one conclude that the Georgia analysts would not have included suspect 3 if they had not received the extraneous information and had followed the same protocol as the other 17? If the variability due to ordinary subjectivity in the process is such that 1 time out 17 an analyst will include the reference profile in question, then the probability that a Georgia analyst would do so is 0.059. I am not a firm believer in hypothesis testing at the 0.05 level, but I cannot help thinking that even under the hypothesis that the Georgia group was not affected to the slightest degree by the extraneous information, the chance that the result would have been the same is not negligible.

In raising these concerns, I certainly am not claiming that an expectation or motivation effect arising from information about the nature of the crime and the need for incriminating evidence played no role in the Georgia case. But the research reported in the paper does not go very far to establish that it was a significant factor and that it was the cause of the disparity between the Georgia analysts and the 17 others.

The authors’ caveat that “it is always hard to draw scientific conclusions when dealing with methodologies involving real casework” is not responsive to these criticisms. The problem lies with conclusions that outstrip the data when it would have been straightforward to collect better data. The sample size here does not give a reliable estimate of variability in judgments of examiners working in the same laboratory. The sampling plan ignores the possibility of greater variability across laboratories. Perhaps the confined scope of the study reflects a lack of funding or an unwillingness of forensic analysts to cooperate in research because of the pressure of their caseloads or for other reasons -- a complaint aired in Mnookin et al. (2011). Inasmuch as the researchers do not explain how they chose their small sample, it is hard to know.

Beyond the ubiquitous issue of sample size, the subjects in the "experiment" were not assigned to treatment and control groups. No analysts were given the same extraneous information that the Georgia ones had. Of course, extraneous information presented in an experiment could be less influential than it would be in practice. External validity is always a problem with experiments. But there is reason to believe that a controlled experiment could detect an effect in a case like this. Bill Thompson (2009) reported anecdotes suggesting that even outside of actual casework, different DNA examiners presented with electropherograms of mixtures can be induced to reach different conclusions when given extraneous and unnecessary information about the case. That the effect might be less in simulated conditions does not mean that it is undetectable in a controlled experiment.

Convincing scientific knowledge flows from a combination of well designed experimental and observational studies. Dr. Dror's work on fingerprint comparisons (e.g., Dror 2006) has contributed to a better understanding of the effect of examiner expectations in that task. Experiments designed to detect the impact of potentially biasing information on interpretation of DNA profiles in a controlled setting also would be worth undertaking.


Itiel E. Dror & Greg Hampikian (2011), Subjectivity and Bias in Forensic DNA Mixture Interpretation, Sci. & Justice, doi:10.1016/j.scijus.2011.08.004,

Itiel E. Dror, David Charlton & Ailsa E. Peron (2006), Contextual Information Renders Experts Vulnerable to Making Erroneous Identifications, Forensic Science International, 156(1): 74-78

David H. Kaye (2010), The Double Helix and the Law of Evidence

David H. Kaye & George Sensabaugh (2011), Reference Guide on DNA Identification Evidence, in Reference Manual on Scientific Evidence, 3d ed.

Jennifer L. Mnookin et al. (2011), The Need for a Research Culture in the Forensic Sciences, UCLA Law Review, 58(3): 725-779

National Research Council Committee on Identifying the Needs of the Forensic Sciences Community (2009), Strengthening Forensic Science in the United States: A Path Forward, Wash DC: National Academy Press

William C. Thompson, Simon Ford, Travis Doom, Michael Raymer, Dan E. Krane, Evaluating Forensic DNA Evidence: Essential Elements of a Competent Defense Review, The Champion, Apr. 2003, at 16

William C. Thompson (2009), Painting the Target Around the Matching Profile: the Texas Sharpshooter Fallacy in Forensic DNA Interpretation, Law, Probability & Risk 8: 257-276

Cross-posted at the Double Helix Law blog

No comments:

Post a Comment