Forensic Science, Statistics & the Law: August 2015

Forensic-science practitioners commonly present findings regarding traces left at crime-scenes, on victims or suspects, or on or in their possessions. Such trace evidence can take many forms. Physical traces such as fingerprints, striations on bullets, shoe and tire prints, and handwritten documents are common examples. Biological materials, such as blood, semen, saliva, and hairs also are fodder for the crime laboratory. Comparisons of a questioned and known sample can supply valuable information on whether a specific suspect is associated in some manner with a crime. Viewers of the acronymious police procedurals—NCIS, CSI, and Law and Order SVU—know all this.

For decades, however, legal and other academics have questioned the hoary courtroom claims of absolutely certain identification of one and only one possible source of trace evidence. In the turbulent wake of the 2009 report of a National Research Council committee, these views have slowly gained traction in the forensic-science community. Indeed in the popular press and among investigative reporters, the pendulum may be swinging in favor of uncritical rejection of once unquestioned forensic sciences. Recent months have seen an episode from Frontline presenting DNA evidence as "anything but proven"; ^1/ they have included unfounded reports that as many as 15 percent of men and women imprisoned with the help of DNA evidence at trial are wrongfully convicted; ^2/ and award-winning journalists have spread the word that the FBI "faked an entire field of forensic science,"^3/ placed "pseudoscience in the witness box," ^4/ and palmed off "virtually worthless" evidence as scientific truth. ^5/

The last set of reports stem from an ongoing review of well over 20,000 cases in which the FBI laboratory issued reports on hair associations. The review spans decades of hair comparisons, and it is showing so many questionable statements that the expert evidence on hair associations stands out as "one of the country's largest forensic scandals." ^6/ Its preliminary findings provoked prominent Senators to speak of an "appalling and chilling ... indictment of our criminal justice system" ^7/ and to call for a "root cause analysis" of ubiquitous errors. ^8/ A distressed Department of Justice and FBI joined with the Innocence Project and the National Association of Defense Lawyers not only to publicize these failings, but also to call on states "to conduct their own independent reviews where ... examiners were trained by the FBI." ^9/ Projecting the outcome of cases that have yet to be reviewed, postconviction petitions refer ominously to "[t]housands of . . . cases the Justice Department now recognizes were infected by false expert hair analysis" ^10/ and "pseudoscientific nonsense." ^11/

The hair scandal illustrates two related problems with many types of forensic-science testimony. The first is the problem of foundation—What reasons are there to believe that hair or other analysts possess sufficient expertise to produce relevant evidence of associations between known and unknown samples? For years, commentators and some defense counsel have posed legitimate questions (with little impact in the courts) about the reliability and validity of physical comparisons by examiners asked to judge whether known and unknown samples are similar in enough respects—and not too dissimilar in other respects—to support a claim that they could have originated from the same individual. To paraphrase Gertrude Stein, is there enough there there to warrant any form of testimony about a positive association? This is the existential question of whether, in the words in of the Court in Daubert v. Merrell Dow Pharmaceuticals, "[t]he subject of an expert's testimony [is] 'scientific . . . knowledge.'" ^12/ Or, at the other extreme, is the entire enterprise ersatz—a "fake science" and a "worthless" endeavor?

As I see it, the harsh view that physical hair comparisons are pure pseudoscience, like astrology, graphology, homeopathy, or metoposcopy, is not supportable. The FBI's review project itself rests on the premise that hair evidence has some value. If the comparisons were worthless—like consulting the configurations of the stars or reading Tarot cards—there would be no need to review individual cases. In all cases of an association, the FBI would have exceeded the limits of science. But this only shows that the FBI thinks that there is a basis for some testimony of a positive association. What evidence supports this belief?

One reason to think that the FBI's hair analysts generally possess some expertise in making associations comes from an intriguing study (usually cited as proof of the failings of microscopic hair comparisons) done more than ten years ago. ^13/FBI researchers took human hairs submitted to the FBI laboratory for analysis between 1996 and 2000 and mitotyped them (a form of DNA testing) whenever possible. The probability of an FBI examiner finding of a positive association when comparing hairs from the same individual (as shown by mitotyping) exceeded the probability of this finding when comparing hairs from different individuals by a factor of 2.9 (with a 95% confidence interval of 1.7 to 4.9). A test with this performance level only supplies evidence that is, on average, weakly diagnostic of an association. Still, it is not simply invalid.

The second problem lies in the presentation of perceived associations. Even if there is a there there, are forensic-science practitioners staying within the boundaries of their demonstrated expertise? Or are they purporting to know more than they do? The FBI's revelations about hair evidence are confined to this issue of overclaiming. What the FBI has uncovered are expert assertions in one case after another that are said to outstrip the core of demonstrated knowledge. Such overclaiming is one form of scientifically invalid testimony, ^14/ but it is not the equivalent of an entire invalid science.

In considering the pervasiveness of the problem of overclaiming, the FBI's figure of 90+ percent is startling. It is so startling that one should ask whether it accurately estimates the prevalence of scientifically indefensible testimony. There are reasons to suspect that it might not. After all, the Hair Comparison Review Project was not designed to estimate the proportion of cases in which FBI examiners gave testimony that was, on balance, scientifically invalid. It is intended to spot isolated statements that claimed more than an association that, for all we know, could be quite common in the general population. The general criteria for judging whether particular testimony falls into this category have been publicized, but the protocol and specific criteria that the FBI reviewers are using have not been revealed. No representative sample of the reports or transcripts that are judged to be problematic is available, but a few transcripts in the cases in which the FBI has confessed scientific error indicate that at least some classifications are open to serious question.

It may be instructive to contrast the response to two very similar statements noted in a posting of May 23, 2015, about the court-martial of Jeffrey MacDonald: (1) "this hair ... microscopically matched the head hairs of Colette MacDonald"; and (2) "[a] forcibly removed Caucasian head hair ... exhibits the same microscopic characteristics as hairs in the K2 specimen. Accordingly, this hair is consistent with having originated from Kimberly MacDonald, the identified source of the K2 specimen." The first statement passed muster. The second did not.

If the review in this case is not aberrant, examiners can say that two hairs share the same features—they "match" and are consistent with one another—but they must not add the obvious (and scientifically undeniable) fact that this observation (if correct) means that they could have had the same origin or that they are "consistent with" this possibility.

Of course, one can criticize phrases like "consistent with" and "match" as creating an unacceptable risk that (in the absence of clarification on direct examination, cross-examination, or by judicial instruction) jurors will think the words connote a source attribution. But arguments of this sort stray from determinations that an examiner has made statements that "exceed the limits of science" (the phrase the Justice Department uses in confessing overclaiming). They represent judgments that an examiner has made statements that are scientifically acceptable but prone to being misunderstood.

To be sure, this latter danger is important to the law. It should inform rulings of admissibility under Rules of Evidence 403 and 702. It is a reason to regulate the manner in which experts testify to scientifically acceptable findings, as some courts have done. Laboratories themselves should adopt and enforce policies to ensure that reports and testimony avoid terminology that is known to convey the wrong impression. But it is misleading to include scientifically acceptable but psychologically dangerous phrasing in the counts of scientifically erroneous statements. Case-review projects ought to flag all instances in which examiners have not presented their findings as they should have, but reports ought to differentiate between statements that directly "exceed the limits of science" and those that risk being misconstrued in a way that would make them "exceed the limits of science." One size does not fit all.

NOTES

This posting is an abridged and modified version of a forthcoming essay about the FBI's Microscopic Hair Review Comparison Review. A full draft of the preliminary version that is being edited for publication is available. Comments and corrections are welcome, especially before publication while there is time to improve the essay.

Some inaccuracies in the documentary are noted in a June 24, 2015, posting on this blog.
See the posting of July 3, 2015 on the blog (debunking this initial assertion of the Rand Corporation).
Dahlia Lithwick, Pseudoscience in the Witness Box: The FBI Faked an Entire Field of Forensic Science, Slate (Apr. 22, 2015 5:09 PM).
Id. These "shameful, horrifying errors" comprised "a story so horrifying . . . that it would stop your breath." Id.
Erin Blakemore, FBI Admits Pseudoscientific Hair Analysis Used in Hundreds of Cases: Nearly 3,000 Cases Included Testimony About Hair Matches, a Technique that Has Been Debunked, Smartnews (Apr. 22, 2015) (quoting Ed Pilkington, Thirty Years In Jail For A Single Hair: The FBI's 'Mass Disaster' of False Conviction, Guardian, Apr. 21, 2015.
Spencer S. Hsu, FBI Admits Flaws in Hair Analysis over Decades, Wash. Post, Apr. 18, 2015.
Id. (quoting "Sen. Richard Blumenthal (D-Conn.), a former prosecutor").
Spencer S. Hsu, FBI Overstated Forensic Hair Matches in Nearly All Trials Before 2000, Wash. Post, Apr. 19, 2015 (quoting "Senate Judiciary Committee Chairman Charles E. Grassley (R-Iowa) and the panel's ranking Democrat, Patrick J. Leahy (Vt.)").
FBI, FBI Testimony on Microscopic Hair Analysis Contained Errors in at Least 90 Percent of Cases in Ongoing Review (Apr. 20, 2015).
Petition for a Writ of Certiorari, at 2-3, Ferguson v. Steele, 134 S.Ct. 1581 (2014) (No. 13-1069).
Id. at 19. Justice Breyer saw the “errors” referred to in the press release as emblematic of “flawed forensic testimony” generally and a reason to hold the death penalty unconstitutional. Glossip v. Gross, No. 14-7955 (June 29, 2015) (dissenting opinion).
509 U.S. 579, 590 (1993).
Max M. Houck & Bruce Budowle, Correlation of Microscopic and Mitochondrial DNA Hair Comparisons, 47 J. Forensic Sci. 1 (2002).
For this vocabulary, see Brandon Garrett & Peter Neufeld, Invalid Forensic Science Testimony and Wrongful Convictions, 95 Va. L. Rev. 1 (2009); cf. Eric S. Lander, Fix the Flaws in Forensic Science, N.Y. Times, Apr. 21, 2015 (“The F.B.I. stunned the legal community on Monday with its acknowledgment that testimony by its forensic scientists about hair identification was scientifically indefensible in nearly every one of more than 250 cases reviewed.”).

One of the responses to the 2009 NRC report on forensic science was the creation last year of an Organization of Scientific Area Committees (OSAC) for forensic science organized by the National Institute of Standards and Technology (NIST). This organization is developing new standards for forensic disciplines to follow.

Last week, NIST opened a 30-day public comment period for five standards from the Chemistry Scientific Area Committee. They are continuations or updates of existing ASTM (American Society of Testing and Materials) standards. The NIST OSAC News Release on the public comment period is at http://www.nist.gov/forensics/osac/osac-opens-public-comment.cfm. The five standards under consideration for inclusion on the OSAC Registry of Approved Standards are as follows:

ASTM E2329-14 Standard Practice for Identification of Seized Drugs
ASTM E2330-12 Standard Test Method for Determination of Concentrations of Elements in Glass Samples Using Inductively Coupled Plasma Mass Spectrometry (ICP-MS) for Forensic Comparisons
ASTM E2548-11e1 Standard Guide for Sampling Seized Drugs for Qualitative and Quantitative Analysis
ASTM E2881-13e1 Standard Test Method for Extraction and Derivatization of Vegetable Oils and Fats from Fire Debris and Liquid Samples with Analysis by Gas Chromatography-Mass Spectrometry
ASTM E2926-13 Standard Test Method for Forensic Comparison of Glass Using Micro X-ray Fluorescence (µ-XRF) Spectrometry

Although they may seem technical and have forbidding names, some of these proposed standards should be of interest to lawyers as well as forensic scientists and statisticians who might want them to address how findings should be presented in court or in reports. For example, one standard involving glass fragments requires a difference of 3 standard deviations before the analyst can reject the hypothesis that the fragments on the suspect came from the crime scene. But 2.9 usually would be pretty good evidence that the suspect's fragments are from some other glass. What should the standard require or allow an expert to report in cases like this, where the suspect's fragments lie within the broad window for measurement error? Should there be an adjustment to the rejection range if more than one fragment has been tested? Should there even be a fixed window, or should the analyst simply report the probability of differences in the measurements as or more extreme as those observed if the fragments on the suspect came from the crime-scene glass? Better still, can a likelihood ratio be provided?

It appears that this 30-day period also offers an opportunity to view related ASTM standards on forensic science tests. Normally, ASTM, as the copyright holder, does not make its standards freely available.

Directions for subscribing to the OSAC newsletter and receiving announcements of comment periods, new standards, etc., are at the above URL and at http://www.nist.gov/forensics/osac/osac-launches-monthly-newsletter.cfm.

Disclosure and disclaimer: Although I am a member of the Legal Resource Committee of OSAC, the views expressed here (to the extent I have expressed any) are mine alone. They are not those of any organization. They are not necessarily shared by anyone inside (or outside) of NIST, OSAC, any SAC, any OSAC Task Force, or anyone else in the Legal Resource Committee.

Forensic Science, Statistics & the Law

Pages

Saturday, August 22, 2015

Disentangling Two Issues in the Hair Evidence Debacle

Monday, August 17, 2015

First NIST OSAC Forensic Science Standards Up for Public Comment

Labels

Popular Posts

Search This Blog

Blog Archive

Places to visit, books to read, meetings to attend [or to avoid]