Saturday, April 30, 2011

Part III of Fingerprinting Under the Microscope: Error Rates and Predictive Value

The Noblis-FBI experiment presented 169 relatively experienced and proficient latent fingerprint examiners (LPEs), who knew they were being tested, with unusually challenging pairs of latent and exemplar prints. The LPEs worked through a total of 17,121 presentations of 744 image pairs (100 pairs per examiner). How did they do? Table S5 of the study gives the answers, but it is a little hard to read. I have extracted parts of it to produce two simpler tables. Table 1 summarizes the results of the LPEs’ efforts for those pairs of prints that they initially deemed of value for individualization (VIn). Table 2 does the same for the pairs that they initially deemed of value for exclusion only (VExO). I ignore the cases in which an LPE judged the latent print unsuitable for comparison and terminated the process at that point.

Table 1. Outcomes for Pairs Judged To Be of Value for Individualization (VIn)

Table 2. Outcomes for Pairs Judged To Be of Value for Exclusion Only (VExO)

Table 3 adds the numbers in Tables 1 and 2 to describe the outcomes for all pairs judged to be of any value for comparisons.

Table 3. Outcomes for Pairs Judged To Be of Value (VIn or VExO)

Many books describe the interpretation of simpler two-by-two tables of binary decisions (like exclusion and identification) for two states of nature (such as nonmates and mates). For discussion in a legal context, see David H. Kaye et al., The New Wigmore on Evidence: Expert Evidence (2d ed. 2011).The row for inconclusives complicates the analysis slightly, as indicated below.

1. False Positives and Sensitivity

A false positive is an opinion that the pair of prints originated from the same finger of the same individual (an inclusion or identification) when, in fact, the exemplar and the latent came from different sources (nonmated pairs).

Only 10,052 (59%) of the presentations were deemed of value for individualization (VIn). Of these, 4,083 were nonmates, and 5,969 were mates. Five examiners (5/169 = 3%) made false identifications. Their answers to a questionnaire did not indicate anything unusual in their backgrounds. Three of them said they were certified (one did not respond to the background survey).

One of the five examiners made two false identifications, making the false positive rate(for pairs of prints deemed VIn

FPRVIn = P(identification | nonmate & VIn) = 6/10052 = 0.1%.

In clinical medicine, “sensitivity” denotes the probability that a diagnostic or screening test (such as a blood test for a disease) will give a positive result when the disease is present. If the test includes a quantitation that gives an “inconclusive” reading when the blood sample is too small, this would reflect an inherent limitation of the test rather than a lack of sensitivity to the disease when applied to an adequate sample. Analogously, the sensitivity of the examiners is the proportion of mated VIn pairs for which the LPEs reached a conclusion that were judged to be identifications. By this reasoning,

Sensitivity = P(identification | mate & VIn & conclusion) = 3663/4113 = 89.1%.

If the examiners’ inability to reach a conclusion after declaring a pair of prints as of value for individualization were treated as detracting from sensitivity, then their sensitivity in this experiment was only

Sensitivity = P(identification | mate & VIn) = 3663/5969 = 61.4%.

Most examiners did not indicate that the 6 pairs that produced false positive errors were difficult comparisons, and for only 2 of the 6 false positives did the LPE making the error describe the comparison as difficult.

In no case did two examiners make the same false positive error. The errors occurred on image pairs where a large majority of examiners made correct exclusions; one occurred on a pair where the majority of examiners judged the comparisons to be inconclusive. Thus, the six erroneous identifications probably would have been detected if blind verification were performed as part of the operational examination process.

Two of the false positive errors involved a single latent, but with exemplars from different subjects. Four of the five distinct latents on which false positives occurred (vs. 18% of nonmated latents) were deposited on a galvanized metal substrate, which was processed with cyanoacrylate and light gray powder. These images were often partially or fully tonally reversed (light ridges instead of dark), on a complex background.

2. False Negatives and Specificity

Whereas the false positive rate was only FPRVIn = 0.1%, the false negative rate for prints deemed of value for identification or exclusion was much larger:

FNRVIn = 450/5969 = 7.5%; FNRVIn+VExO = 611/8169 = 7.5%.

The specificity of a clinical test is the probability that it will report that the disease is absent when the disease actually is absent. Here, if the calculation is limited to pairs that produced definite conclusions,

Specificity = P[exclusion | nonmate & (VIn or VExO) & conclusion] = (3622 + 325) / (3622 + 6 + 325) = 99.8%.

If we regard inconclusives as a sign of the inability to exclude when an exclusion is warranted, however, we get a smaller value:

Specificity = P[exclusion | nonmate & (VIn or VExO)] = (3622 + 325) / (3622 + 6 + 325 + 1856 + 2019) = 79.2%.

Eighty-five percent of examiners made at least one false negative error, distributed across half of the image pairs that were compared. Awareness of previous errors was not correlated with the false negative errors; indeed, 65% of participants said that they were unaware of ever having made an erroneous exclusion after training.

Years of experience were at best weakly correlated with FNRVIn. The correlation coefficient was only 0.15 (p = 0.063). The correlation with certification was not even close to statistical significance (p = 0.871).

3. Posterior Probabilities

False negative and positive rates tell us how LPEs responded to mates and nonmates, but they are not direct measures of the probability that an identification or an exclusion is correct. This posterior probability also depends on the prior probability that a pair is from the same source. The formula that gives the posterior probabilities is Bayes’ rule. Using the proportion of mates in the paired prints deemed VIn (59%), the predictive values were

PPV = P(mate | identification & VIn) = 3663/3669 = 99.8%


NPV = P(nonmate | exclusion & VIn) = 3622/4072 = 88.9%.

Using the proportion for all pairs designated as of value for either individualization or exclusion gives essentially the same values:

PPV = P(mate | identification & VIn or VExO) = 3701/3709 = 99.8%


NPV = P(nonmate | exclusion & VIn or VExO) = 3947/4558 = 86.6%.

In casework, the prevalence of mated pair comparisons varies substantially among organizations, by case type, and by how candidates are selected. Mated comparisons are far more prevalent in cases where the exemplars come from individuals suspected of leaving the latent print because of nonfingerprint evidence than when candidates come from an AFIS trawl. The predictive values given above therefore would not apply in most cases. The final installment will discuss a much better way to use the experiment to inform a jury or jury about the value of a fingerprint identification.

No comments:

Post a Comment