Thursday, February 25, 2016

"Stress Tests" by the Department of Justice and the FBI's "Approved Scientific Standards for Testimony and Reports"

Yesterday, Deputy Attorney General Sally Yates addressed the assembled members of the American Academy of Forensic Sciences at their annual scientific meeting in Las Vegas. 1/ Some excerpts and remarks follow:
In the near future, we expect the FBI to solicit bids for an independent review—or “root cause analysis”—to determine what went wrong and why in the hair analysis field. We hope that this review will help us identify potential blind spots in our own practices and develop effective corrective measures.

But it does not take a root cause analysis to draw some initial conclusions about errors arising in the FBI’s pre-2000 hair cases. It’s clear that, in at least some of the cases reviewed, lab examiners and attorneys either overstated the strength of the forensic evidence or failed to properly qualify the limitations of the forensic analysis. This doesn’t necessarily mean that there were problems with the underlying science—it means that the probative value of the scientific evidence wasn’t always properly communicated to juries. And as you all know, it’s crucial we put this type of evidence in its proper context, given that laypeople can misunderstand the science.
I'll resist the obvious puns about a root cause analysis for hair testimony, but I wonder whether it pays to fund a major study of the culture that produced the overstated hair testimony.  Aren't the solutions to overstated testimony — ultracrepidarianism, as I have called it 2/ — fairly obvious? They are (1) better education and training of criminalists about the limits of their knowledge; (2) clear standards specifying what testimony is permissible; (3) better education and training of prosecutors and defenders about the statements that they as well as the criminalists can make; and (4) comprehensive and reasonably frequent review, not just of expert testimony, but also of the questions and opening and closing arguments of prosecutors, to ensure compliance with testimonial standards.
To address this problem, the FBI is close to finalizing new internal standards for testimony and reporting—which they’re calling “Approved Scientific Standards for Testimony and Reports,” or ASSTR. These documents, designed for almost all forensic disciplines currently practiced by the FBI, will clearly define what statements are supported by existing science. This will guide our lab examiners when they draft reports and take the witness stand, thereby reducing the risk of testimonial overstatement.
That is welcome news about the FBI. More broadly, OSAC needs to develop similar standards, and the Department of Justice should provide better training for its prosecutors. Experts are not the only sources of misstatements at trials. Both prosecutors and defense lawyers can misstate the content or implications of expert testimony. It happens all the time with DNA random-match probabilities, and in the infamous Santae Tribble case, it was the Assistant US Attorney, and not the expert witness, who told the jury that "[t]here is one chance, perhaps for all we know, in 10 million that it could [be] someone else’s hair."3/
While the FBI is preparing an ASSTR for each discipline, it’s fair to say that the risk of overstatement can vary depending on the discipline. The risk is arguably the lowest in certain types of disciplines, such as those involving chemical analysis. In drug testing, for example, the current technology makes it possible for experts to determine the chemical composition of a controlled substance with a high degree of certainty and with very little human interpretation.
"Arguably" is a key word here. "[D]isciplines ... involving chemical analysis" have the potential for suppressing uncertainty. Take a look at the ASTM E2937-13 Standard Guide for Using Infrared Spectroscopy in Forensic Paint Examinations. This document presupposes that a major task of the chemical analyst is "to determine whether any significant differences exist between the known and questioned samples." It defines a "significant difference" as "a difference between two samples that indicates that the two samples do not have a common origin." The forensic chemist is expected to declare whether "[s]pectra are dissimilar," "indistinguishable," or "inconclusive." The standard offers no guidance on how to explain the significance of spectra that are "indistinguishable." This is precisely the problem that hair analysts faced. Moreover, just as there is subjectivity in deciding whether hairs are indistinguishable, the forensic chemistry standard offers only a self-described "rule of thumb." This rule proposes "that the positions of corresponding peaks in two or more spectra be within ± 5 cm-1," but "[f]or sharp absorption peaks one should use tighter constraints. One should critically scrutinize the spectra being compared if corresponding peaks vary by more than 5 cm-1. Replicate collected spectra may be necessary to determine reproducibility of absorption position."

What is the nature of the "critical scrutiny" that permits an examiner to classify peaks that differ by more than 1 / (5 cm) as "similar"? When are any replicates necessary? How many are necessary? What is the accuracy of examiners who follow this open-ended "rule of thumb"? Perusing such standards suggests that forensic chemistry may not be so radically different from the other disciplines in which the risk of ultracrepidarianism is seen as more acute.
But, as you all know, the degree of certainty may be more difficult to quantify in other forensic disciplines. For example, a relatively small number of disciplines call on forensic professionals to compare two items—such as shoe prints or tire treads—and make judgments about their similarities and differences. These so-called “pattern” or “impression” disciplines present unique challenges, especially when an examiner attempts to assess the likelihood that the two items came from the same source.
As the paint standard exemplifies, forensic professionals compare two items in many fields. The spectra used in forensic chemistry are patterns. DNA profiles are patterns. Even some of the software that is supposed to give objectively established probabilities of the components of a DNA mixture have parameters that analysts can adjust as they see fit. Perhaps the line between the quantified-degree-of-certainty fields and the difficult-to-quantify ones is not entirely congruent with a simple divide between pattern-and-impression evidence and other forensic fields.
In any business, whether it’s medicine or manufacturing, it is standard practice to regularly review your internal procedures to make sure you’re performing at the highest level possible. Our DOJ labs do this all the time, and we plan to do it here, too. The department intends to conduct a quality assurance review of other forensic science disciplines practiced at the FBI—to determine whether the same kind of “testimonial overstatement” we found during our review of microscopic hair evidence could have crept into other disciplines that rely heavily on human interpretation and where the degree of certainty can be difficult to quantify. We’re thinking of it as a forensics “stress test.”
This sounds great, but how is "quality assurance review" a "stress test"? In cardiology, a stress test "determines the amount of stress that your heart can manage before developing either an abnormal rhythm or evidence of ischemia (not enough blood flow to the heart muscle)." 4/ In the banking system, stress testing examines whether banks have "sufficient capital to continue operations throughout times of economic and financial stress and that they have robust, forward-looking capital-planning processes that account for their unique risks." 5/ Checking whether "you're performing at the highest level possible" in providing testimony in the ordinary course of affairs is not a stress test for the FBI or anybody else (although I suppose it could prove to be stressful).

Of course, whether one should call the planned review a "stress test" is purely a matter of terminology. A much more important question is which disciplines will be reviewed. It would be unfortunate if the only recipients of review are the criminalists who examine  patterns and impressions in the form of toolmarks, shoeprints, handwriting, and so on. As we have just seen, it is not so easy to specify all the "disciplines that rely heavily on human interpretation and where the degree of certainty can be difficult to quantify."
This is an important moment in forensic science. The rise of new technologies presents both tremendous opportunities and potential challenges. At the same time, we must grapple with some of the most basic questions that lie at the intersection of science and the law: how do we make complex scientific principles understandable for judges, attorneys and jurors? How do we accurately communicate to laypeople the many things that forensic science can teach us—ensuring that we neither overstate the strength of our evidence nor understate the value of this information?

There are no easy answers ...
Amen to that!

  1. Office of the Deputy Attorney General, Justice News: Deputy Attorney General Sally Q. Yates Delivers Remarks During the 68th Annual Scientific Meeting Hosted by the American Academy of Forensic Science, Feb. 24, 2016,
  2. David H. Kaye, Ultracrepidarianism in Forensic Science: The Hair Evidence Debacle, 72 Wash. & Lee L. Rev. Online 227 (2015).
  3. David H. Kaye, The FBI's Worst Hair Days, Forensic Science, Statistics and the Law, July 31, 2014.
  4. WebMD, Heart Disease and Stress Tests,
  5. Board of Governors of the Federal Reserve System, Stress Tests and Capital Planning,

1 comment:

  1. In drug testing, for example, the current technology makes it possible for experts to determine the chemical composition of a controlled substance with a high degree of certainty and with very little human interpretation.

    Excuse me: ASTM E2329-14 and SWGDRUG Recommendations: "It is expected that in the absence of unforeseen error, an appropriate analytical scheme effectively results in no uncertainty in reported identifications."