Tuesday, November 1, 2016

PCAST and the Ames Bullet Cartridge Study: Will the Real Error Rates Please Stand Up?

An article in yesterday’s Boston Globe reports that “the [PCAST] report’s findings have also been widely criticized, especially by those in the forensics field, who argued that the council lacked any representation from ballistics experts. They argued that the council’s findings do not undermine the accuracy of firearms examinations.” 1/

The criticism that “ballistics experts” did not participate in writing the report is unpersuasive. These experts are great at their jobs, but reviewing the scientific literature on the validity and reliability of their toolmark comparisons is not a quotidian task. Would one criticize a meta-analysis of studies on the efficacy of a surgical procedure on the ground that the authors were epidemiologists rather than surgeons?

On the other hand, the argument that the “findings do not undermine the accuracy of firearms examinations” is correct (but inconclusive). True, the President’s Council of Advisors on Science and Technology (PCAST) did not find that toolmark comparisons as currently practiced are inaccurate. Rather, it concluded (on page 112) that
[F]irearms analysis currently falls short of the criteria for foundational validity, because there is only a single appropriately designed study to measure validity and estimate reliability. The scientific criteria for foundational validity require more than one such study, to demonstrate reproducibility.
In other words, PCAST found that existing literature (including that called to its attention by “ballistics experts”) does not adequately answer the question of how accurate firearms examiners are when comparing markings on cartridges—because only a single study that was designed as desired by PCAST provides estimates of accuracy.

Although PCAST’s view is that more performance studies are necessary to satisfy Federal Rule of Evidence 702, PCAST uses the single study to derive a false-positive error rate for courtroom use (just in case a court disagrees with its understanding of the rule of evidence, or the science, or in case the jurisdiction follows a different rule).

To evaluate PCAST's proposal, it will be helpful first to describe what the study itself found. Athough “it has not yet been subjected to peer review and publication” (p. 111), the “Ames study,” as PCAST calls it, is available online. 2/ The researchers enrolled 284 volunteer examiners in the study, and 218 submitted answers (raising an issue of selection bias). The 218 subjects (who obviously knew they were being tested) “made ... l5 comparisons of 3 knowns to 1 questioned cartridge case. For all participants, 5 of the sets were from known same-source firearms [known to the researchers but not the firearms examiners], and 10 of the sets were from known different-source firearms.” 3/ Ignoring “inconclusive” comparisons, the performance of the examiners is shown in Table 1.

Table 1. Outcomes of comparisons
(derived from pp. 15-16 of Baldwin et al.)

~S S
E 1421 4 1425
+E 22 1075 1097

1443 1079
E is a negative finding (the examiner decided there was no association).
+E is a positive finding (the examiner decided there was an association).
S indicates that the cartridges came from bullets fired by the same gun.
~S indicates that the cartridges came from bullets fired by a different gun.

False negatives. Of the 4 + 1075 = 1079 judgments in which the gun was the same, 4 were negative. This false negative rate is Prop(–E |S) = 4/1079 = 0.37%. ("Prop" is short for "proportion," and "|" can be read as "given" or "out of all.") Treating the examiners tested as random samples of all examiners of interest, and viewing the performance in the experiment as representative of the examiners' behavior in casework with materials comparable to those in the experiment, we can estimate the portion of false negatives for all examiners. The point estimate is 0.37%. A 95% confidence interval is 0.10% to 0.95%. These numbers provide an estimate of how frequently all examiners would declare a negative association in all similar cases in which the association actually is positive.Instead of false negatives, we also can describe true negatives, or specificity. The observed specificity is Prop(E|~S) = 99.63%. The 95% confidence interval around this estimate is 99.05% to 99.90%.

False positives. The observed false-positive rate is Prop(+E |~S) = 22/1443 = 1.52%, and the 95% confidence interval is 0.96% to 2.30%. The observed true-positive rate, or sensitivity, is 98.48%, and its 95% confidence interval is 97.7% to 99.04%.

Taken at face value, these results seem rather encouraging. On average, examiners displayed high levels of accuracy, both for cartridge cases from the same gun (better than 99% specificity) and from different guns (better than 98% sensitivity).

Applying such numbers to individual examiners and particular cases obviously is challenging. The PCAST report largely elides the difficulties. (See Box 1.) It notes (on page 112) that "20 of the 22 false positives were made by just 5 of the 218 examiners — strongly suggesting that the false positive rate is highly heterogeneous across the examiners"; however, it does not discuss the implications of this fact for testimony about "the error rates" that it wants "clearly presented." It calls for "rigorous proficiency testing" of the examiner and disclosure of those test results, but it does not consider how the examiner’s level of proficiency maps onto to the distribution of error rates seen in the Ames study. Neither does it consider how testimony should address the impact of verification by a second examiner. If the errors occur independently across examiners (as might be the case if the verification is truly blind), then the relevant false-positive error rate drops to (1.52%)2 = 0.0231%. Is omitting some correction for verification an appropriate way to present the results of a rigorously verified examination? Indeed, is a false-positive error rate enough to convey the probative value of a positive finding? I will discuss the last question later.


Foundational validity. PCAST finds that firearms analysis currently falls short of the criteria for foundational validity, ... . If firearms analysis is allowed in court, the scientific criteria for validity as applied should be understood to require clearly reporting the error rates seen in appropriately designed black-box studies (estimated at 1 in 66, with a 95 percent confidence limit of 1 in 46, in the one such study to date). [P. 112.]

Validity as applied. If firearms analysis is allowed in court, validity as applied would, from a scientific standpoint, require that the expert: (1) has undergone rigorous proficiency testing on a large number of test problems to evaluate his or her capability and performance, and discloses the results of the proficiency testing ... . [P. 113.]

[The] false-positive rate for examiner cartridge case comparisons ... was measured and for the pool of participants used in this study the fraction of false positives was approximately 1%. The study was specifically designed to allow us to measure not simply a single number from a large number of comparisons, but also to provide statistical insight into the distribution and variability in false-positive error rates. The ... overall fraction is not necessarily representative of a rate for each examiner in the pool. Instead, ... the rate is a highly heterogeneous mixture of a few examiners with higher rates and most examiners with much lower error rates. This finding does not mean that 1% of the time each examiner will make a false-positive error. Nor does it mean that 1% of the time laboratories or agencies would report false positives, since this study did not include standard or existing quality assurance procedures, such as peer review or blind reanalysis. [P. 18.]

  1. Milton J. Valencia, Scrutiny over Forensics Expands to Ballistics, Boston Globe, Oct. 31, 2016, https://www.bostonglobe.com/metro/2016/10/31/firearms-examinations-forensics-come-under-review/zJnaTjiGxCMuvdStkuSvyO/story.html
  2. David P. Baldwin, Stanley J. Bajic, Max Morris & Daniel Zamzow, A Study of False-positive and False-negative Error Rates in Cartridge Case Comparisons, Ames Laboratory, USDOE, Technical Report #IS-5207 (2014), at https://afte.org/uploads/documents/swggun-false-postive-false-negative-usdoe.pdf 
  3. Id. at 10.

No comments:

Post a Comment