Monday, August 14, 2017

PCAST's Review of Firearms Identification as Reported in the Press

According to the Washington Post,
The President’s Council of Advisors on Science and Technology [PCAST] said that only one valid study, funded by the Pentagon in 2014, established a likely error rate in firearms testing at no more than 1 in 46. Two less rigorous recent studies found a 1 in 20 error rate, the White House panel said. 1/
The impression that one might receive from such reporting is that errors (false positives? false negatives?) occur in about one case in every 20, or omaybe one in 40.

Previous postings have discussed the fact that a false-positive probability is not generally the probability that an examiner who reports an association is wrong. Here, I will indicate how well the numbers in the Washington Post correspond to statements from PCAST. Not all of them can be found in the section on "Firearms Analysis" (§ 5.5) in the September 2016 PCAST report, and there are other numbers provided in that section.

But First, Some Background

By way of background, the 2016 report observes that
AFTE’s “Theory of Identification as it Relates to Toolmarks”—which defines the criteria for making an identification—is circular. The “theory” states that an examiner may conclude that two items have a common origin if their marks are in “sufficient agreement,” where “sufficient agreement” is defined as the examiner being convinced that the items are extremely unlikely to have a different origin. In addition, the “theory” explicitly states that conclusions are subjective. 2/
A number of thoughtful forensic scientists agree that such criteria are opaque or circular. 3/ Despite its skepticism of the Association of Firearm and Tool Mark Examiners' criteria for deciding that components of ammunition come from a particular, known gun, PCAST acknowledged that
relatively recently ... its validity [has] been subjected to meaningful empirical testing. Over the past 15 years, the field has undertaken a number of studies that have sought to estimate the accuracy of examiners’ conclusions.
Unfortunately, PCAST finds almost all these studies inadequate. "While the results demonstrate that examiners can under some circumstances identify the source of fired ammunition, many of the studies were not appropriate for assessing scientific validity and estimating the reliability because they employed artificial designs that differ in important ways from the problems faced in casework." 4/ "Specially, many of the studies employ 'set-based' analyses, in which examiners are asked to perform all pairwise comparisons within or between small samples sets." Some of these studies -- namely, "closed-set" designs "may substantially underestimate the false positive rate." The only valid way to study validity and reliability, the report insists, is with experiments that require examiners to examine pairs of items in which the existence of a true association is independent of an association in each and every other pair.

The False-positive Error Rate in the One Valid Study

According to the Post, the "one valid study ... established a likely error rate in firearms testing at no more than 1 in 46." This sentence is correct. PCAST reported a "bound on rate" of "1 in 46." 5/ This figure is the upper bound of a one-sided 95% confidence interval. Of course, the "true" error rate -- the one that would exist if there were no random sampling error in the selection of examiners -- could be much larger than this upper bound. Or, it could be much smaller. 6/ The Post omits the statistically unbiased "estimated rate" of "1 in 66" given in the PCAST report.

The 1 in 20 False-positive Error Rate for "Less Rigorous Recent Studies"

The statement that "[t]wo less rigorous recent studies found a 1 in 20 error rate" seems even less complete. The report mentioned five other studies. Four "set-to-set/closed" studies suggested error rates of 1 in 5103 (1 in 1612 for the 95% upper bound). Presumably, the Post did not see fit to mention all the "less rigorous" studies because these closed-set studies were methodologically hopeless -- at least, that is the view of them expressed in .the PCAST report.

The Post's "1 in 20 figure" apparently came from PCAST's follow-up report of 2017. 7/ The addendum refers to a re-analysis of a 14-year-old study of eight FBI examiners co-authored by Stephen Bunch, who "offered an estimate of the number of truly independent comparisons in the study and concluded that the 95% upper confidence bound on the false-positive rate in his study was 4.3%." 8/ This must be one of the Post's "two less rigorous recent studies."  In the 2016 report, PCAST identified it as a "set-to-set/partly open" study with an "estimated rate" of 1 in 49 (1 in 21 for the 95% upper bound). 9/

The second "less rigorous" study is indeed more recent (2014). The 2016 report summarizes its findings as follows:
The study found 42 false positives among 995 conclusive examinations. The false positive rate was 4.2 percent (upper 95 percent confidence bound of 5.4 percent). The estimated rate corresponds to 1 error in 24 cases, with the upper bound indicating that the rate could be as high as 1 error in 18 cases. (Note: The paper observes that “in 35 of the erroneous identifications the participants appeared to have made a clerical error, but the authors could not determine this with certainty.” In validation studies, it is inappropriate to exclude errors in a post hoc manner (see Box 4). However, if these 35 errors were to be excluded, the false positive rate would be 0.7 percent (confidence interval 1.4 percent), with the upper bound corresponding to 1 error in 73 cases.) 10/
Another Summary

Questions of which studies count, how much they count, and what to make of their limitations are intrinsic to scientific literature reviews. Journalists limited to a few sentences hardly can be expected to capture all the nuances. Even so, a slightly more complete summary of the PCAST review might read as follows:
The President’s Council of Advisors on Science and Technology said that an adequate body of scientific studies does not yet exist.to show that toolmark examiners can associate discharged ammunition to a specific firearm with very high accuracy. Only one rigorous study with one type of gun, funded by the Defense Department, has been conducted. It found that examiners who reached firm conclusions made positive associations about 1 time in 66 when examining cartridge cases from different guns. Less rigorous studies have found both higher and lower false-positive error rates for conclusions of individual examiners, the White House panel said.
NOTES
  1. Spencer S. Hsu & Keith L. Alexander, Forensic Errors Trigger Reviews of D.C. Crime Lab Ballistics Unit Prosecutors Say, Wash. Post, Mar. 24, 2017.
  2. PCAST, at 104 (footnote omitted).
  3. See, e.g., Christophe Champod, Chris Lennard, Pierre Margot & Milutin Stoilovic, Fingerprints and Other Ridge Skin Impressions 71 (2016) (quoted in David H. Kaye, "The Mask Is Down": Fingerprints and Other Ridge Skin Impressions, Forensic Sci., Stat. & L., Aug. 11, 2017, http://for-sci-law.blogspot.com/2017/08/the-mask-is-down-fingerprints-and-other.html)
  4. PCAST, at 105.
  5. Id. at 111, tbl. 2.
  6. The authors of the study had this to say about the false-positive errors:
    [F]or the pool of participants used in this study the fraction of false positives was approximately 1%. The study was specifically designed to allow us to measure not simply a single number from a large number of comparisons, but also to provide statistical insight into the distribution and variability in false-positive error rates. The result is that we can tell that the overall fraction is not necessarily representative of a rate for each examiner in the pool. Instead, examination of the data shows that the rate is a highly heterogeneous mixture of a few examiners with higher rates and most examiners with much lower error rates. This finding does not mean that 1% of the time each examiner will make a false-positive error. Nor does it mean that 1% of the time laboratories or agencies would report false positives, since this study did not include standard or existing quality assurance procedures, such as peer review or blind reanalysis. What this result does suggest is that quality assurance is extremely important in firearms analysis and that an effective QA system must include the means to identify and correct issues with sufficient monitoring, proficiency testing, and checking in order to find false-positive errors that may be occurring at or below the rates observed in this study.
    David P. Baldwin, Stanley J. Bajic, Max Morris, and Daniel Zamzow, A Study of False-Positive and False-Negative Error Rates in Cartridge Case Comparisons, May 2016, at 18, available at https://www.ncjrs.gov/pdffiles1/nij/249874.pdf.
  7. PCAST, An Addendum to the PCAST Report on Forensic Science in Criminal Courts, Jan. 6, 201.
  8. Id. at 7.
  9. PCAST, at 111, tbl. 2
  10. Id. at 95 (footnote omitted).

No comments:

Post a Comment