Monday, July 9, 2012

If the Shoe Fits, You Must Not Calculate It (Part I)

In R. v. T., [2010] EWCA Crim. 2439, the Court of Appeal of England and Wales wrote an opinion that dismayed, if not enraged, leading forensic scientists across the globe. The brouhaha began with testimony in a murder trial that there was "a moderate degree of scientific evidence to support the view that the [Nike trainers recovered from the appellant] had made the footwear marks."

This evidence came from "Mr Ryder of the Forensic Science Service (FSS)." Mr. Ryder compared four aspects of the footwear marks from a murder scene and a pair of Nike "trainers found in the appellant's house after his arrest," namely:
  • Pattern (p). The FSS maintained a database of the characteristics of the shoes it inspected. About 20% had the pattern of the soles of the Nikes—the same pattern seen in the shoeprints. The probability of the pattern in a pair of shoes not worn at the crime scene (-W) would be P(p | -W) = 1/5, where the vertical line stands for "given" or "conditional on."
  • Size (s). According to another database, 3% of shoes sold with that pattern were size 11 (UK). Given the uncertainty in the precise size of a shoe that might have left the marks and in the effects of wear, the examiner adjusted this last figure upward. He estimated that as many as 10% of shoes sold would be in the right size range. Hence, P(p & s | -W) = 1/10).
  • Wear (w). He estimated (somehow) that about 50% of relevant shoes would show as much wear as was indicated by the impression and the shoes themselves. P(p & s & w | -W) = 1/2.
  • Damage (d). Finally, he felt that he that marks indicative of damage to the shoes added almost nothing to the other information. P(p & s & w & d | -W) = 1.
It follows that if the marks did not come from the defendant's shoes, the probability that they would be comparable to the ones at the murder scene in these four respects would be P(p & s & w & d | -W) = (1/5)(1/10)(1/2)(1) = 1/100.

A frequentist statistician might say that the similarities between the impressions and the defendant’s shoes are good evidence that the shoes left the marks because a p-value of 0.01 is small.

A likelihoodist statistician would want to know more. It is not enough to believe that an outcome is improbable under the defense’s hypothesis that the defendant’s shoes did not leave the marks (-W). One also must consider the probability of the marks under the prosecution’s hypothesis that the defendant’s shoes left the marks (W). The "law of likelihood" postulates that when the probability of the evidence under one hypothesis exceeds that under the competing, simple hypothesis, it supports the former over the latter to a degree given by the ratio of the conditional probabilities. If the two probabilities in the "likelihood ratio" are equal, then the evidence is to be expected to the same extent under both hypotheses. It cannot help us discriminate between them. Thus, some law review article writers have called the likelihood ratio a "relevance ratio."

Here, the probability that the impressions would match the shoes if they had indeed come from the defendant's Nike trainers was almost 100%, so Mr. Ryder concluded that the evidence (E = p & s & w & d) was about 100 times more probable if the marks came from the defendant's shoes (W) than if they came from other shoes (-W). In symbols, the likelihood ratio (LR) for his conditional probabilities is

LR = P(E | W) / P(E | -W) = 1 / (1/100) = 100.

Mr. Ryder made this rough estimate "to confirm an opinion substantially based on his experience and so that it could be expressed in a standardised form." He wrote three reports and testified, but never once did he mention these numbers. Rather, he testified that "In my opinion there is a moderate degree of scientific support the view that the [Nike trainers] made those marks."

He chose the word "moderate" from a table that the Forensic Science Service had selected for ranges of the likelihood ratio. The table, which he did not mention at trial or in his written reports, classified LRs from 10-100 as providing “moderate support.” The use of a standard table of "verbal equivalents" finds approval in reports of the European Association of Forensic Service Providers and a committee of the US National Research Council.

A Bayesian statistician would agree that a likelihood ratio of 100 supports the prosecution's theory substantially more than the defense's. But this statistician would not stop here. He would argue that the LR is a "Bayes factor." It raises the prior odds on W by 100. A juror willing to post prior odds of only 1 to 10 for the prosecution's hypothesis before hearing Mr. Ryder's evidence (and harboring no doubts about the veracity and accuracy of that evidence) now should be willing to revise the odds upward. Specifically, Bayes' rule gives posterior odds of LR x prior odds = 100 x 1/10 = 10 to 1. Whatever the value V of the prior odds, the posterior odds for this evidence are 100V.

Mr. Ryder stopped with the verbiage derived from the likelihoods and the FSS table. He did not give a Bayesian interpretation to the evidence -- something that the Court of Appeal had strongly disapproved of in earlier cases. Even so, the court in R. v. T. held that his testimony of "a moderate degree of scientific evidence to support the [prosecution's] view" rendered the conviction unsafe and therefore required a new trial.The next posting on the topic will explain why.

No comments:

Post a Comment