Thursday, May 30, 2013

People v. Garcia and Low Template DNA (LT-DNA)

Recent postings (March 27, May 25) about the appeal of Amanda Knox and Raffaele Sollecito suggested that the court’s decision not to order more DNA tests on the kitchen knife was less a manifestation of bad judicial mathematics than a judgment about the possible costs and benefits of additional low template (LT-DNA) testing as perceived by court-appointed experts. As a publication of the Royal Statistical Society noted last year, “Many questions at the extremities of LTDNA technology remain unanswered, and scientific disputes between experts are sometimes ventilated in litigation.” (Puch-Solis et al. 2012, at 85 ¶ 60.20, discussing English Court of Appeal cases in ¶¶ 60.21 & 60. 22).

In the United States, appellate courts have yet to address the admission of LT-DNA results. The latest opinion I have seen comes from a trial court in Bronx County, New York. In People v. Garcia, 39 Misc.3d 482, 963 N.Y.S.2d 517 (N.Y. Sup. Ct. 2013), a woman was found suffocated, with a sock in her mouth, and bound, with duct tape binding her face and limbs. Pablo Garcia was charged with her murder and related crimes. A piece of the duct tape contained DNA from at least two individuals. The New York City Office of the Chief Medical Examiner (OCME) reported that the mixture “is 586 times more probable if the sample originated from this defendant and one unknown, unrelated person than if it originated from two unknown, unrelated persons.”

Now, it is easy to see how a murderer applying the tape would leave some DNA on it, but why would there be additional DNA from an “unknown” rather than from the victim? One would think that duct tape used to bind an individual would include that person’s DNA rather than “one unknown, unrelated person.” However, it could be—I am speculating here—that the sample came from duct tape wrapped on top of other duct tape rather than a piece attached directly to the skin. Would the additional DNA profile have come from someone in the factory where the tape was made? From someone besides the murderer who touched its edges after the package had been opened? These are the kinds of possibilities that need to be considered when dealing with extremely small quantities of “contact DNA.”

Hoping to exclude the LT-DNA evidence, Garcia requested a pretrial hearing on whether the OCME’s protocols for typing LT-DNA and computing likelihood ratios were generally accpeted in the relevant scientific community. Judge Nicholas Iacovetta denied the request. He found that Garcia failed to raise much doubt about general acceptance. He wrote that “There is nothing new or novel about LCN DNA profiling. It simply represents the application of accepted and reliable procedures [like PCR and electrophoesis] that are applied in a modified manner.”

This reasoning, which has been used in previous cases, seems far too facile, After all,
PCR-based STR profiling with capillary electrophoreses is generally accepted for the purpose of producing identifying profiles because experiments have demonstrated its validity and it fits into well-established theories of chemistry and biology. It satisfies the validity standard of Daubert for the same reasons. But this foundation might not extend to the domain of the smaller samples. A light microscope works wells for studying bacteria but its magnification is not adequate for viewing much smaller viruses. For that purpose, an electron microscope is required. Radar can track airplanes or flocks of birds, but the signal-to-noise ratio is too low for it to be useful in tracking the flight path of a solitary, high-flying butterfly. The radar equipment is identical, and the operator is no less skilled at interpreting what he sees on the screen, but the procedure has not been validated (and would not be valid) for butterfly tracking.

Likewise, in the case of touch DNA, the relevant question is not whether the instrumentation and chemicals are identical or whether the analysts are using the same standards for interpretation. It is whether the system has been validated in the range in which it is being used. This is not a question about how well a validated or generally accepted procedure worked on a particular occasion ... . It is a ... question about the ability, under the best of conditions, of the equipment and its operators to pick out a weak but true signal from the noise. Until this trans-case question is resolved, admissibility is unjustified under Frye and Daubert.
Kaye, Bernstein and Mnookin (2011, § 9.2.3 at 428).

These observations do not necessarily mean that all forms of LT-DNA profiling--including the OCME's methodology--should be held inadmissible under the general acceptance standard for scientific evidence followed in New York. They simply mean that one must look to suitable experiments published in scientific journals or other places where they would be subject to critical review by other interested scientists, to the testimony or published views of appropriate scientists who have studied the matter, and to other indications of general acceptance.

Judge Iacovetta did some of this scrutiny and concluded “[s]eparately ... that LCN DNA testing conducted by OCME and its [statistical analysis] are both generally accepted as reliable in the forensic scientific community.”

The bases for this separate conclusion, however, are not uniformly compelling. First, the court pointed to “a lengthy Frye hearing” conducted by another trial court that determined “that LCN DNA testing ... when properly performed, ... is generally accepted as reliable in the scientific community.” Before accepting the conclusion of another judge who did conduct an evidentiary hearing, however, a court should ensure that the hearing actually aired the views of a cross-section of the scientific community. Especially in the early days of a scientific technique, imbalanced hearings are not uncommon. For examples in the DNA area, see Kaye (2010).

Second, the Garcia opinion stated that “[o]ther New York trial courts have also admitted LCN DNA results in evidence after denying defense requests for a Frye hearing ... LCN DNA testing has been admitted in New York State trial courts over 125 times, and in a federal district court in the Southern District of New York without a Frye hearing and in courts of multiple other countries including Germany, The United Kingdom, Sweden and Switzerland.” Again, a history of usage—especially without any hearings on the necessary foundational research and with no meaningful appellate review of the trial rulings—is a weak indicator of scientific acceptance.

Third, the court noted that “[a]lthough OCME is the only government facility currently using LCN DNA testing, several private and academic laboratories, such as the University of North Texas, use LCN DNA testing. OCME ... has been certified to conduct LCN DNA testing since 2005, using it to help identify the remains of victims of the World Trade Center terror attack in 2001 ... .” This is better. Usage of a method outside the courtroom, especially in matters that supply feedback on how well the method works, helps establish acceptance. (However, more details demonstrating the equivalence of the other uses to the one in Garcia itself would have been helpful.)

Fourth, the court observes that “OCME's own validation studies of LCN DNA testing ... were examined and certified by the New York State Commission On Forensic Science (NYSCFS) in 2005” and that “OCME is also audited yearly.” Favorable review by outside scientific staff of the commission and an auditing organization of a suitable set of the validity studies (as opposed to audits to show that the laboratory follows its protocols, investigates and corrects problems, and the like) argues in favor of general acceptance.

Finally, in discussing the software the OCME developed to help interpret mixed, partial DNA profiles from small samples, the opinion alludes to "peer reviewed articles in professional journals such as the International Journal of Forensic Genetics." I am not familiar with a journal by this name, and it does not seem to have a web site. Perhaps the court was referring to papers in Forensic Science International: Genetics (Mitchell et al. 2011; Mitchell et al. 2012). See also Caragine et al. (2009). The best evidence of general acceptance of validated technologies is publication in established scientific journals followed (ultimately) by a cessation of debate over the findings.The OCME publications support the laboratory procedures as well as the software for interpreting the data from those procedures.

Focusing on the software (named FST), Garcia states that
Other software programs such as True Allele, Life TD, Forenism, and Locomation, a software tool designed in the 1990's, also use Bayes Theorem to perform functions similar to the FST. The difference is that the FST uses empirically established drop-in and drop-out rates generated by thousands of tests, rather than just estimating them, which makes the FST more accurate as a predictor of likelihood ratios.
At the risk of quibbling (an occupational and personal hazard), these programs do not "predict likelihood ratios." If they predict anything, they predict genotypes, and which program's LRs best express the probative value of the inferred genotypes depends on more than how each program handles drop-in and drop-out probabilities.

To sum up, even if is wrong to regard the application of established technologies to LT-DNA as a trivial variation that requires no further legal scrutiny to establish admissibility, it also appears that one can make a reasonable case for general scientific acceptance of LT-DNA typing as required under New York law. The Garcia opinion shows how one trial judge was convinced.

References

Saturday, May 25, 2013

Bad Math or Passable Law? DNA Testing in the Continuing Prosecution of Amanda Knox and Raffaele Sollecito

In an previous posting, I raised some questions about an op-ed ("Justice Flunks Math") on the judge's refusal to depart from the court-appointed expert's written report in the prosecution of Amanda Knox and Raffaele Sollecito. This week, a flurry of opinionated comments appeared, and I let those that seemed to have at least some analysis or substance through the gate.

In my previous posting, I took issue with the op-ed's assertion that the trial judge "demonstrated a clear mathematical fallacy: assuming that repeating the test could tell us nothing about the reliability of the original results" and its apparent suggestion that retesting the same DNA sample would be comparable to testing a coin for bias by repeatedly tossing it. I argued that "[w]ithout some specification of precisely what made the initial testing problematic and whether those problems could be reduced sufficiently with retesting, it seems precipitous to convict the judge who overturned the guilty verdict of 'bad math.'"

Whatever the merits of the indictment of the judge, my thanks to those who offered information on whether retesting might be significantly more revealing than the initial testing. That is an interesting question in its own right.

In this regard, an author of the op-ed, Professor Leila Schneps kindly explained that the "confirming retest" (the phrase in her op-ed) did not mean a retest of the same sample (like flipping a coin again) but rather an analysis of a "new knife blade sample," a "rich sample ... from the place where the blade joins the handle of the knife." This new sample, she suggests, might be "positive for Meredith Kercher," in which case, "it would have correctly settled two of the questions left outstanding in the courtroom: was the first electropherogram showing the DNA on the knife correctly interpreted as Meredith's, and was Meredith's DNA actually on the knife?"

If we posit that the new sample is large enough to produce unambiguous results, then it could reveal whether "Meredith's DNA [was] actually on the knife." But Professor Schneps also states that the "rich sample" was "significantly lower than the quantity 'advised' by the kit, although the kit's website shows many examples of tests on smaller samples, some even smaller than the knife blade DNA, that gave positive and accurate results."

If the sample is this impoverished, are we not back in the realm of low-template DNA testing, where the worry is that stochastic effects can be dominant? The mathematical argument here seems to be that even though it might not be surprising to spot, by chance alone, some peaks in a new test that also are present in Meredith's genotype, the probability of those peaks plus the ones seen in the original testing of a different sample from the knife would be negligible unless Meredith's DNA was on the knife. In this way, the additional testing overcomes the low signal-to-noise ratio in each sample. That is a fair argument (as far as it goes), and the same logic underlies some protocols for testing contact DNA.

Still, given the difficulties and the level of discord over the best approaches to conducting and interpreting LT-DNA testing (see, e.g., A. Carracedo, P.M. Schneider, J. Butler & M. Prinz, Focus issue—Analysis and Biostatistical Interpretation of Complex and Low Template DNA Samples, Forensic Science International: Genetics 6 (2012) 677–678), and the court's experts' concerns about contamination, I wonder whether even the most mathematically erudite judge would have been so quick to order additional DNA testing in this case. Consequently, I am not yet prepared to give the judge a flunking grade for "a clear mathematical fallacy."

Tuesday, May 21, 2013

Potshots: “Blank Stares” and “No Data” on Latent Fingerprint Identification

According to as astute an observer as David Faigman,
In ... fields such as latent fingerprint identification, firearms, clinical psychology, and clinical psychiatry ... , if judges ask the question, “Where are the data?” they would be met with blank stares. If you ask a latent fingerprint examiner, “Where are your data?” the answer is likely to be, “Data. We have no data. In fact, we don't need data. We're specialists.” ... Many of these experts have been practicing their trade for twenty-five years; they know it when they see it. ... Under Daubert, however, even if your data happen to be experience, you have to be able to articulate how you came to know what you think you know. (Faigman 2013, 914).
A footnote explains that being “able to articulate how you came to know what you think you know” can be accomplished by “checking the basis for believing that the experience will produce reliable testimony.” (Ibid., 914 n. 64).

Now, I am no fan of claims of fingerprint examiners to be able to match latent prints to reference prints with absolute certainty (NIST 2012) and of lax and superficial court opinions allowing such testimony. (Kaye 2013; Kaye, Bernstein and Mnookin 2011). But the assertion that there are absolutely no data to show that latent print examiners can “produce reliable testimony” is too much for me to swallow. Indeed, in his treatise on scientific evidence, Professor Faigman does not insist that “no data” exist. The treatise correctly recognizes that “[a] few well-designed studies have now been conducted” (Faigman et al. 2012, § 33:56). To the list of six studies noted in the treatise (ibid., § 33.49 n. 10), one can add Tangen, Thompson, and McCarthy (2011). (As explained here last June, this Australian study showed a false negative rate of under 8% and a false positive rate of under 1% (Fingerprinting Error Rates Down Under, June 24, 2012)).

Perhaps Professor Faigman meant to say that even if data exist to support the judgments of fingerprint analysts—as they clearly do at a general level—a particular examiner’s judgments are not based on data, but on standardless, subjective impressions of the degree of similarity that warrants an identification or an exclusion. They just “know it when they see it.” That observation is closer to the mark (no pun intended). It is the gist of David Harris's contention that "most forensic science does not qualify as science in any true sense of that term." (Harris 2012, 36). Like Professor Faigman, Professor Harris complains that "[d]isciplines like fingerprint analysis, firearms tool-mark analysis, and bite-mark analysis have no basis in statistics, and do not originate in inquiry conducted according to scientific principles. Rather, they rely on human judgment grounded in experience ... without reference to rigorous and agreed-upon standards." (Ibid.) Identification experts who do not follow a protocol with quantitative or other external standards to achieve high inter-rater reliability should not insist that they are following the "scientific method." (Compare Kaye 2012, 123).

But "science" is not the only source of useful information, and experiments can measure the levels of accuracy for subjective as well as objective procedures. DNA laboratories have verified that DNA analysis performed in a specified way correctly distinguishes between samples taken from the same source and samples taken from different sources.  Indeed, this is the only sense in which it could be said that “DNA profiling [always] ... had known error rates” (Faigman 2013, 913). Even today, the error rates of DNA laboratories in actual case work is not really known. In the same manner, tests of fingerprint analyses performed by trained examiners show that they are capable of routinely distinguishing between marks taken from the same source and marks taken from different sources (with some errors). Again, however, we do not know the error rates of these examiners in actual case work. (Kaye 2012).

Consequently, appropriately documented latent print comparisons undertaken without unnecessary exposure to biasing information, presented with a recognition of the uncertainty in the largely subjective procedure and verified by an independent examiner blinded to the initial outcome as well as the output of an automated scoring system, should survive the “more rigorous test” (Faigman 2013) established in Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993), and embellished in later cases. Although there is ample room to improve fingerprint comparisons by human examiners and to implement automated systems for latent print work, “blank stares” and “no data” are no longer the only answers available to an objection under Daubert.

References

Faigman, David L. 2013. “The Daubert Revolution and the Birth of Modernity: Managing Scientific Evidence in the Age of Science.” University of California at Davis Law Review 46:893–930.

Harris, David A. 2012. Failed Evidence: Why Law Enforcement Resists Science. New York and London: New York University Press.

Kaye, David H. 2013. “Experimental and Scientific Evidence: Criminalistics.” In McCormick on Evidence, edited by Kenneth Broun, § 207. Eagan, MN: West Publishing Co.

Kaye, David H., ed. 2012. Expert Working Group on Human Factors in Latent Print Analysis, Latent Print Examination and Human Factors: Improving the Practice Through a Systems Approach. Gaithersburg: National Institute of Standards and Technology.

Kaye, David H., David E. Bernstein, and Jennifer L. Mnookin, 2011. The New Wigmore: A Treatise on Evidence: Expert Evidence. New York: Aspen Publishing Company, 2d ed.

Tangen, Jason M., Matthew B. Thompson, and Duncan J. McCarthy 2011. “Identifying Fingerprint Expertise.” Psychological Science 22:995 (available online).