Sunday, June 24, 2012

Fingerprinting Error Rates Down Under

How accurate is latent fingerprint identification? Considering that fingerprints are the most common form of trace evidence (for crimes in general), this is a vital question. The other day, I mentioned one court’s view that a false identification rate of 0.01% is only marginally higher than 1 in 11 million. The former statistic comes from a well designed experiment—the Noblis-FBI study published in 2011.\1/ The latter comes from a U.S. Court of Appeals opinion citing the earlier testimony of an FBI fingerprint supervisor.

A few months after the publication of the Noblis-FBI study, a short report of a second controlled experiment on the ability of latent print examiners to match those prints to exemplars appeared, this time in the journal Psychological Science.\2/ University of Queensland psychology lecturer Jason M. Tangen and two co-authors recruited 37 “qualified practicing fingerprint experts from five police organizations” and 37 college students to see how they would do in evaluating pairs of prints—some from the same fingers (mates) and some from different fingers (nonmates). Id. at 995.

The Task

Although the Australian researchers wrote that the task they gave their subjects “emulates the most forensically relevant aspect of the identification process,” id., the professional examiners and the students did not have the options of declaring a pair of images unsuitable for analysis or ultimately inconclusive.\3/ Instead, these “[p]articipants were asked to judge whether the prints in each pair matched, using a confidence rating scale ranging from 1 (sure different) to 12 (sure same) ... [with] ratings of 1 through 6 indicat[ing] a match [and] ratings of 7 through 12 indicat[ing] no match.” Id.

The members of the two groups each received “36 simulated crime-scene prints that were paired with fully rolled prints.” Id. Some of the nonmates were supposed to be similar to the latent print of the pair because they were the closest matches (according to a computer program) in the Australian National Automated Fingerprint Identification System (ANAFIS). The other nonmates were plucked at random from the research database and ANAFIS. Each participant received a 12 latent prints paired with “similar” nonmates, 12 paired with “nonsimilar” nonmates, and 12 paired with mates. Id. at 996.

How They Did

False negative rate. The “experts performed exceedingly well.” Id. at 997. In the 12 × 37 = 444 trials of mates, “experts correctly identified 92.12% of the pairs, on average, as matches (hits), misidentifying 7.88% as nonmatches (misses).” Id.

False positive rate. For the pairs intended to be difficult (the similar nonmates), “experts correctly declared nearly all of the pairs (99.32%) to be nonmatches (correct rejections); only 3 pairs (0.68%) out of the 444 in this condition were incorrectly declared to be matches (false alarms).” Id. at 997. Furthermore, not a single expert “misidentified any of the 12 nonsimilar distractor prints as matches.” Id.

Students. The undergraduates did not fare nearly as well as the practitioners. For example, they “mistakenly identified 55.18% of the [pairs of] similar ... [nonmates] as matches.” Id.

The following two tables list more or less comparable error rates from both studies.\4/


Table 1. False negative rates

Professionals
(US)
Professionals
(Australia)
Undergraduates
Mates 450/4113
(10.9%)
35/444
(7.88%)
113/444
(25.45%)


Table 2. False positive rates

Professionals
(US)
Professionals
(Australia)
Undergraduates
Similar nonmates 6/3628
(0.02%)
3/444
(0.68%)
245/444
(55.18%)
Nonsimilar nonmates 0/444
(0.00%)
102/444
(22.97%)

Discussion

As both sets of researchers appreciate, these rates do not necessarily generalize to casework. They simply exemplify what can be achieved under the experimental conditions, in which the subjects knew they were being tested. Nevertheless, the sensitivity and specificity of professional examiners in ascertaining when a pair of prints emanates from the same finger should help mute the most extreme criticism of the field. By the same token, they should prompt investigations of the conditions under which misclassifications tend to occur and, as Tangen et al. note, they “should affect the testimony of forensic examiners and the assertions that they can reasonably make.” Id. at 997.

The Australian researchers are impressed by the extent to which professional examiners outperformed undergraduate students. But some of the gap could be caused in part by a difference in motivation. The students received course credit for turning in answers, but they may have had little incentive to agonize over the best classification in every one of the 36 comparisons they were asked to make. This confounding variable should be considered before making unequivocal claims of “a real performance benefit” that “may satisfy legal admissibility criteria.” Id.

Notes

1. Bradford T. Ulery, R. Austin Hicklin, JoAnn Buscaglia & Maria Antonia Roberts, Accuracy and Reliability of Forensic Latent Fingerprint Decisions, 108 Proc. Nat’l Acad. Sci. 7733 (2011), available at http://www.pnas.org/content/108/19/7733.full.pdf.

2. Jason M. Tangen, Matthew B. Thompson, and Duncan J. McCarthy, Identifying Fingerprint Expertise, 22 Psych. Sci. 995 (2011), available at http://mbthompson.com/wp-content/uploads/2011/03/TangenThompsonMcCarthyIdentifyingFingerprintExpertisePsycScience2011.pdf.

3. The third author, a latent print analyst, believed that all the simulated latent prints were of value for identification. Id. at 996.

4. The studies differ in significant ways, limiting the number of comparisons that one can make and the confidence that one can have in direct comparisons of the statistics. The Noblis-FBI study used no randomly selected nonmate exemplars—all the nonmate pairs were “similar” within the meaning of the Australian study. Also, only practicing fingerprint examiners participated, so the student-professional dichotomy has not been replicated. To account as much as possible for the fact that all the pairs of prints had to compared and an ultimate conclusion had to be drawn in the Australian experiment, the rates from the U.S. study are based on instances in which the subjects deemed the prints to be of value for individualization and reached a conclusion about the comparison. Even if this adjustment is adequate for some purposes, however, it does not eliminate the possibility that the inability to use the “inconclusive” category contributed to the larger false positive rate of the Australian subjects.

No comments:

Post a Comment