Sunday, February 15, 2015

"Remarkably Accurate": The Miami-Dade Police Study of Latent Fingerprint Identification (Pt. 2)

A week ago, I noted the Justice Department’s view that a “study of ... latent print examiners ... found that examiners make extremely few errors. Even when examiners did not get an independent second opinion about the decisions, they were remarkably accurate.” 1/ But just how accurate were they?

The police who conducted the study “[p]resented the data to a professor from the Department of Statistics at Florida International University” (p. 39), and this “independent statistician performed a statistical analysis from the data generated” (p. 45). The first table in the report (Table 4, p. 53) contains the following data (in slightly different form):

Table 1. Classifications of Pairs
Examiner's
Statement
Nonmates (N) Mates (M)
953 235
+ 42 2547
? 403 446

Here, “+” stands for a positive opinion of identity between a pair of prints (same source), “–” denotes a negative opinion (an exclusion), and “?” indicates a refusal to make either judgment (an inconclusive) even though the examiner initially deemed the prints sufficient for comparison.

What do the numbers in Table 1 mean? As noted in my previous posting, they pertain to the judgments of 109 examiners with regard to various pairings of 80 latent prints with originating friction ridge skin (mates) and nonoriginating skin (nonmates). A total of 3,138 pairs were mates; of these, the examiners reached a positive or negative conclusion in 2,692 instances. Another 1,398 were nonmates; of these, the examiners reached a conclusion in 995 instances. Given that examiners were presented with mates and that they reached a conclusion of some sort, the proportion of matches declared was P(+|M & not-?) = 2,457/2,692 = 91.3%. These were correct matches. For the pairings in which the examiners reached a conclusion, they declared nonmates to match in P(+|N & not-?) = 42/995 = 4.2% of the pairs. These were false positives. With respect to all the comparisons (including the ones that they found to be inconclusive), the true positive rate was P(+|M) = 2,457/3,138 = 78.3%, and the false positive rate was P(+|N) = 42/1,398 = 3.0%. Similar reasoning applies to the exclusions. Altogether, we can write:

Table 2. Conditional Error Rates

Excluding inconclusives Including inconclusives
False + P(+ | N & not-?)
4.2%
P(+ | N)
3.0%
False – P(– | M & not-?)
8.7%
P(– | M)
7.5%


These error rates, which are clearly reported in the study, do not strike me as "remarkably small"—especially considering that they include the full spectrum of pairs—easy as well as difficult comparisons. Of course, they do not include blind verification of the conclusions, a matter addressed in another part of the study.

The authors report more reassuring values for “Positive Predictive Value” (PPV) and “Negative Predictive Value (NPV).” These were 98.3% and 92.4%, respectively. But these quantities depend on the proportions of pairs that are mates (69%) and nonmates (31%) in the test pairs. The prevalence of mates in casework—or the “prior probability” in a particular case—might be quite different. 2/

A better statistic for thinking about the probative value of an examiner’s conclusion is the likelihood ratio (LR). Are matches declared more frequently when examiners encounter mated pairs than nonmates? How much more frequent are these correct classifications? Are declared exclusions more frequent when examiners encounter nonmates than mates? How much more frequent are these correct classifications?

The LR answers these questions. For declared matches, the LR is P(+|M) / P(+|N) = 0.783 / 0.030 = 26. For declared exclusions, it is P(–|N) / P(–|M) = 9. 3/ These values support the claim that, on average, examiners can distinguish paired mates from paired nonmates. If all the examiners were flipping fair coins to decide, the LRs would be expected to be 1. The examiners did much better than that.

Nevertheless, claims of overwhelming confidence across the board do not seem to be justified. If examiners were presented with equal numbers of mates and nonmates, one would expect that a declared match would be a correct match in P(M|+) = 26/27 = 96% of the cases in which a match is declared. 4/ Likewise, a declared exclusion would a correct classification in P(N|–) = 9/10 = 90% of the instances in which an exclusion is declared. The PPV and PNV in the Miami-Dade study are a little bit higher because the prevalence of mates was 69% instead of 50%, and the examiners were cautious — they were less likely to err when making positive identifications than negative ones.

Suppose, however, that in a case of average difficulty, an average examiner declared a match when the defendant had strong evidence that he never had been in the room where the fingerprints were found. Let us say that a judge or juror, on the basis of the non-fingerprint evidence in the case, would assign a probability of 1% rather than 50% or 69% to the hypothesis of that the defendant is the source of the latent print. The examiner, properly blinded to this evidence, would not know of this small prior probability. An LR of 26 would raise the prior probability from 1% to 26%. Informing the judge or juror of the reported PPV of 98.3% from the study without explaining that it does not imply a “predictive value” of 98.3% in this case would be very dangerous. It would lead the factfinder to regard the examiner’s conclusion as far more powerful than it actually is.

Notes

  1. David H. Kaye, "Remarkably Accurate": The Miami-Dade Police Study of Latent Fingerprint Identification (Pt. 1), Forensic Science, Statistics, and the Law,  Feb. 8, 2015
  2. In addition, the NPV has been adjusted upward from 80% “[i]n [that] consideration was given to the number of standards presented to the participant.” P. 53.
  3. Removing nondeclarations of matches or exclusions (inconclusives) from the denominators of the LRs does not change the ratios very much. They become 22 and 11, respectively.
  4. This result follows immediately from Bayes' rule with a prevalence of P(M) = P(N) = 1/2, since P(M|+) = P(+|M) P(M) / [P(+|M) P(M) + P(+|NM) P(NM)] = P(+|M) / [P(+|M) + P(+|NM)] = LR / (LR + 1) = 26/27.

1 comment:

  1. I'd be very interested in your views as "Pt 3" that looks at the final aspect of this study,namely the examination of verification results when conducted in a context-biased fashion, particularly comparing them statistically to the initial results.

    ReplyDelete