Sunday, May 19, 2019

Shoeprints in Indiana: Confronting a "Skilled Witness" with the PCAST Report

Last week, in Hughes v. State, 1/ the Indiana Court of Appeals wrote an opinion on the admissibility of shoeprint evidence and a defense attempt to present part of the 2016 PCAST report on feature-matching evidence. Mark Adrian Hughes was convicted for breaking into two newly constructed homes and stealing the appliances in them. "Sean Matusko, a forensic scientist with the ISP laboratory's latent-print unit" 2/ testified "that shoeprints found at both crime scenes were made by Hughes's shoes." The trial court overruled defendant's objection to this testimony and barred him from introducing into evidence a part of the PCAST report and from cross-examining Matusko about the content of the report. It reasoned "that Matusko was a "skilled witness" but not an expert one (preventing cross-examination), and that the report was hearsay (preventing its use as evidence).

The unpublished court of appeals opinion, penned by Judge Robert R. Altice, Jr., reversed defendant's convictions, but not because of these rulings. The appellate court determined that the prosecutor improperly introduced evidence of earlier, similar crimes. In remanding the case for a new trial, the court of appeals also discussed the shoeprint rulings. Its analysis is puzzling. The court wrote that
Hughes challenges the trial court's treatment of the State's shoeprint examiner, Matusko, as a skilled witness. Here, Matusko did not simply testify based on his personal experience ... . Rather, ... Matusko identified himself as a forensic scientist assigned to the latent print identification unit of the Indiana State Police, set out his academic background, detailed his training with regard to shoeprint identification, and explained in detail the process he used to identify shoeprints at both crime scenes as being made by Hughes's shoes. [O]ur Supreme Court has indicated that it is not inclined to consider all testimony relating to shoeprint identification to be opinion testimony governed by Evid. R. 702. In light of such precedent and our standard of review, we cannot say that the trial court abused its discretion in admitting Matusko's testimony under Evid. R. 701.
It is inconceivable that a witness who represents himself as a scientist applying a process with which lay jurors are unfamiliar and thereby deducing that a specific pair of shoes left the impressions is not testifying as an expert under Rule 702. He was not there, and he did not see what happened. If he knows anything about the source of the shoprints, it is because of he possesses special knowledge and skill beyond the ken or ordinary witnesses. Indiana Rule of Evidence 702(a) governs all witnesses with "specialized knowledge" who rely on their unusual "knowledge, skill, experience, training, or education [to] help the trier of fact to understand the evidence or to determine a fact in issue." Rule 701, on the other hand, governs opinions from "lay witnesses." It limits them to inferences that would be difficult or tedious to present as more primitive statements of the details the witness perceived. 3/ The division these rules create reflects an ancient distinction in the common law between ordinary fact witnesses -- the Rule 701 category -- and expert witnesses -- the Rule 702 group.

To be sure, a witness sometimes can testify in both capacities. A physician can be an ordinary fact witness in part -- "I saw that the patient was having trouble breathing" -- and a skilled witness in part -- "My diagnosis was pneumonia." But the latter opinion must satisfy Rule 702 to be admissible, and the doctor is subject to cross-examination to suggest that his or her diagnosis is unfounded. If a medical journal states that the diagnosis is not warranted without additional symptoms, the doctor can be asked about that as long as "the publication is established as a reliable authority" under Rule 803(18)(c), for "learned treatises."

Similarly, Matusko could testify -- as an ordinary fact witness under Rule 701 -- that the shoes he was asked to compare to the shoeprints were "Nike Air Jordan athletic shoes with a Jumpman logo molded into the soles." But Rule 701 would not let him speak as a "skilled witness" giving an opinion as to origin of the footprint based on his special skill as a shoeprint examiner. That task always has been reserved for expert witnesses. 4/

Now there might be a reason to present an expert (who does not appear as a "scientist") as merely a "skilled" witness. Indiana Rule of Evidence 702(b) codifies the rule of heightened scrutiny for scientific expert testimony articulated by the U.S. Supreme Court for the federal courts in Merrell Daubert v. Merrell Dow Pharmaceuticals. 5/ Like Daubert, Indiana Rule 702 specifies that "[e]xpert scientific testimony is admissible only if the court is satisfied that the expert testimony rests upon reliable scientific principles."

Whether Matsuko's source attribution can pass that bar is doubtful. At least, the scientists and engineers on the President's Council of Advisors doubted it. They concluded that identifying a particular shoe as the source of a print has yet to be scientifically validated. Because "[t]he entire process—from choice of features to include (and ignore) and the determination of rarity—relies entirely on an examiner’s subjective judgment," PCAST wanted to see studies that tested the performance of criminalists with impressions from the same shoes and from different shoes. Because no such studies exist, PCAST reported that
there are no appropriate empirical studies to support the foundational validity of footwear analysis to associate shoeprints with particular shoes based on specific identifying marks (sometimes called “randomly acquired characteristics"). Such conclusions are unsupported by any meaningful evidence or estimates of their accuracy and thus are not scientifically valid.
The last sentence, if correct, means that the shoeprint testimony cannot be introduced against Hughes on retrial -- not if the witness claims to be acting as a scientist applying a scientific procedure. Furthermore, even if the police criminalist distances himself from scientific titles and airs, the court must decide whether footwear identification is reliable enough to qualify as a different form of expert testimony. The absence of scientific validation will not be determinative, but it is a relevant factor.

Let's assume, though, that the trial court decides that a highly "de-scientized" version of Mr. Matusko's opinion is admissible under Rule 702. Can he be impeached with a scientific report? The Indiana Court of Appeals thought so. A footnote states that
As set out above, Hughes sought but was denied the opportunity to cross-examine Matusko with findings set out in the PCAST publication concerning reliability of shoeprint identification. Although we did not reach the merits of the admissibility of the PCAST publication, cross-examination regarding the findings therein was permissible regardless of whether Matusko was an expert or a skilled witness.
If correct, this conclusion -- that a criminalist can be impeached by confronting him with the PCAST report -- would be a boon to defendants across the nation. Without the trouble and expense of calling an expert witness, defense counsel can wave the devastating critique in front of the witness (and the jury). But the rule on introducing "learned treatises" requires proof that the material is authoritative before it can be used for impeachment. 6/ This limitation makes sense because, unlike impeachment by self-contradiction, the out-of-court statement -- what the President's advisors had to say -- has value only to the extent that it is true. Thus, the document is hearsay.

Nevertheless, even without an expert to establish that PCAST is a reliable authority, the report could be admissible over a hearsay objection. It is, after all, a government report. The public records exception to the rule against hearsay extends to "factual findings from a legally authorized investigation," 7/ as long as "neither the source of information nor other circumstances indicate a lack of trustworthiness." 8/ Law enforcement groups have loudly proclaimed that this particular report is not trustworthy, but much of the criticism is more reflexive than reasoned. Very little of it focuses on footwear analysis. 9/

NOTES
  1. No. 18A-CR-1007, 2019 WL 2094045 (Ind. Ct. App. May 14, 2019) (unreported, available at https://www.in.gov/judiciary/opinions/pdf/05141901rra.pdf).
  2. The witness is featured in an educational state police YouTube video.
  3. Indiana Rule of Evidence 701 applies to "Opinion Testimony by Lay Witnesses." It provides that "If a witness is not testifying as an expert, testimony in the form of an opinion is limited to one that is: (a) rationally based on the witness's perception; and (b) helpful to a clear understanding of the witness's testimony or to a determination of a fact in issue."
  4. Indeed, in Buchman v. State, 59 Ind. 1, 26 Am.Rep. 75 (Ind. 1877), the Indiana Supreme Court held that a physician could not be compelled to testify to a professional opinion without special compensation. Its opinion used the words "skilled witness" and "expert" as synonyms.
  5. 509 U.S. 579 (1993).
  6. David H. Kaye, David Bernstein & Jennifer L. Mnookin, The New Wigmore on Evidence: Expert Evidence ch. 5 (2d ed. 2011).
  7. Ind. R. Evid. 803(8)(A)(i)(c).
  8. Ind. R. Evid. 803(8)(A)(ii).
  9. The Department of Justice's more thoughtful disagreements with the report's general approach to ascertaining scientific validity are presented in Ted Robert Hunt, Scientific Validity and Error Rates: A Short Response to the PCAST Report, 86 Fordham L. Rev. Online 24 (2018). After the initial frosty reception of its report from prosecutors, police, and forensic practitioners, PCAST requested input from the forensic-science community for a second time. It then issued an addendum to its report. With regard to shoeprints, this document stated that
        In its report, PCAST considered feature-comparison methods for associating a shoeprint with a specific shoe based on randomly acquired characteristics (as opposed to with a class of shoes based on class characteristics). PCAST found no empirical studies whatsoever that establish the scientific validity or reliability of the method.
        The President of the International Association for Identification (IAI), Harold Ruslander, responded to PCAST’s request for further input. He kindly organized a very helpful telephonic meeting with IAI member Lesley Hammer. (Hammer has conducted some of the leading research in the field—including a 2013 paper, cited by PCAST, that studied whether footwear examiners reach similar conclusions when they are presented with evidence in which the identifying features have already been identified.)
        Hammer confirmed that no empirical studies have been published to date that test the ability of examiners to reach correct conclusions about the source of shoeprints based on randomly acquired characteristics. Encouragingly, however, she noted that the first such empirical study is currently being undertaken at the West Virginia University. When completed and published, this study should provide the first actual empirical evidence concerning the validity of footwear examination. The types of samples and comparisons used in the study will define the bounds within which the method can be considered reliable.
    An Addendum to the PCAST Report on Forensic Science in Criminal Courts, Jan. 2017, at 5-6.

Saturday, May 11, 2019

Likelihoods, Paternity Probabilities, and the Presumption of Innocence in People v. Gonis

In People v. Gonis, 1/ Illinois prosecuted Kenneth Gonis for sexual penetration with his daughter, T.G., when she was 16 years old. T.G. had two children. The first, J.G., was born when she was 17 years old; A.G. was born two years later. To investigate the sexual assault charge, the Illinois State Police Joliet laboratory conducted DNA tests of Gonis, T.G, and the two children. The lab sent the results to the Northeastern Illinois Regional Crime Laboratory for interpretation. That laboratory’s DNA technical leader, Kenneth Pfoser, “entered the DNA profiles into a computer containing a statistical calculator.” He learned that
  • “at least 99.9999% of the North American Caucasian/White men would be excluded as being the biological father of [J.G. and A.G.]”;
  • the “paternity index” with respect to J.G. was about 195,000,000 and with respect to A.G., it was 26,000,000; and
  • “the probability that defendant was the biological father of J.G. and A.G. was 99.9999%.”
In a bench trial before Judge Lance Peterson, the court admitted these findings and convicted Gonis. On appeal, Gonis argued that the trial court erred in denying a pretrial motion to exclude the DNA test results. In an opinion written by Justice Daniel Schmidt, Illinois’ intermediate court of appeals described the motion as asserting only that
[T]he tests were inconsistent with the presumption of innocence because a statistical formula used in the testing assumed a prior probability of paternity. Specifically, the motion alleged:
Assuming that the Northeastern Illinois Regional Crime Laboratory tested the DNA sample using widely accepted practices in the scientific community, said testing was conducted using a statistical mathematical formula. These formulae, as their basis, include a component to determine paternity which by its nature ‘assumes’ that sexual intercourse has in fact taken place.
In other words,
The motion alleged that to allow such paternity test results would violate the presumption of innocence because “the state would be allowed to introduce statistical evidence presuming sexual intercourse, in order to prove an act of sexual intercourse.”
The argument is fallacious for three reasons. First, the probability pertains to the chance that the child was conceived by the mother and the accused man. Conception—the fertilization of an ovum—can occur without penetration.  In the hearing on the motion to exclude, the technical leader referred to artificial insemination, but as insemination and hence pregnancy can occur without penetration by natural mechanisms as well.

Second, even if conception were not merely improbable, but impossible without penetration, it would not follow that a probability of paternity presumes penetration. After all, a probability is not a certainty. To say that an electron has a probability Ψ*ΨdV of being located in a small volume dV is not to presume that the electron is actually located there. To say that the probability of an extended trade war between the U.S. and China is 0.5 (or some other number less than 1) does not presume that this event will occur. That the paternity probability for the defendant is 0.5 (or 0.99999, or any other other number less than 1) also does not presume that the defendant truly is the source of the fertilizing spermatazoon.

Finally, the evidentiary aspect of the presumption of innocence merely directs the judge or jury not to use the fact of the indictment as evidence of guilt. The probabilities in question do not change depending on whether or not a man is indicted.

The opinion in Gonis seems to rely on the activity-level possibility of artificial insemination to reject the defendant's presumption-of-innocence objection. It also comes close to recognizing the second rejoinder, for it states that "Logically, since Bayes's Theorem allowed for the possibility that defendant may not be the father of T.G.'s children, it did not assume that defendant necessarily had sexual intercourse with T.G."

But the court thought that the details of Bayes' Theorem rather than the very definition of probability made the computation compatible with the presumption of innocence. The opinion states that
Pfoser testified that Bayes's Theorem was a likelihood ratio based on two competing hypotheses: (1) defendant was the father, or (2) a random, unrelated individual was the father. Pfoser stated that Bayes's Theorem took “the assumed probability that the person in question is the father of the child” and divided it “by the probability that some unrelated person within the same race group in the general population is the father of the child.” Thus, Pfoser's testimony indicated that Bayes's Theorem posited that either defendant or an individual other than defendant could have been the father of T.G.'s children. Logically, since Bayes's Theorem allowed for the possibility that defendant may not be the father of T.G.'s children, it did not assume that defendant necessarily had sexual intercourse with T.G.
Although a likelihood ratio appears in Bayes' rule, that is not all there is to it, and the description of how the rule works is garbled. The probability that the defendant is the father is not obtained by dividing a probability that he is the father by the probability that an unrelated man is the father. If the expert knew the probability that an unrelated man is the father (and no other alternatives to the defendant's paternity were worthy of consideration), Bayes' rule would be surperfluous. The probability not assigned to the random man is the defendant's probability, so if we have the random-man probability, all we need to do is to subtract it from 1. What remains is the defendant's probability.

The technical leader used Bayes' rule because he did not know the probability that a random man was the father. Let’s look at his explanation of the computation, as presented by the appellate court. The court starts by recounting that
Pfoser testified that DNA paternity testing had three components. The first component of the test involved an exclusion analysis where Pfoser entered the DNA profiles into a computer containing a statistical calculator. If there were any inconsistencies between the alleged father and the child, the computer would give a result of “0” for paternity index.
Apparently, each child shared at least one allele per locus with the defendant, so the computer program did not report an approximate probability of zero, 2/ and the opinion continued:
The next stage involved the calculation of the paternity index, which was a formula used to determine “the likelihood that the assumed alleged father in question is in fact a father as opposed to a random individual that's unrelated in the general population.”
If "likelihood" has the technical meaning of statistical "support" for a hypothesis, this statement could be literally true. But if "likelihood" means probability, as the court evidently and understandably thought, then the explanation is either meaningless or misleading. There is no probability that the defendant is the father "as opposed to a random individual that's unrelated." There is a probability that the defendant is the father (as opposed to everyone else in the population, given all the evidence in the case). And, the paternity index is not even a probability, let alone that one. It is a ratio of two different probabilities. As the court wrote, "the paternity index is the ratio of 'the probability of the alleged father transmitting the alleles and the probability of selecting these alleles at random from the gene pool.' ... (quoting Ivey v. Commonwealth, 486 S.W.3d 846, 851 (Ky. 2016) (quoting D.H. Kaye, The Probability of an Ultimate Issue: The Strange Cases of Paternity Testing, 75 Iowa L. Rev 75, 89 (1989))."

The important aspect of the paternity index is that it is a likelihood ratio that expresses the support that the reported DNA profiles of the mother-child-defendant trio provide (if correctly determined) for the hypothesis that the defendant is the biological father relative to the hypothesis that an unrelated man is the father. The idea is that if the profiles are some number L times more probable under one hypothesis than the other, then they support that hypothesis L times more than they support the alternative. This ratio does not assume that one hypothesis is true and the other false. Rather, it treats both hypotheses as equally worthy of consideration and addresses the probability of the evidence when each one is considered. Thus, the use of the ratio to describe the strength of the evidence for the better supported hypothesis does not conflict with the presumption of innocence. Had the expert simply given the paternity index and spoken of relative support, the defendant's objection would have had even less traction that it did.

But the technical leader did not describe the paternity index in this “likelihoodist” way. Instead, to quote from the opinion,
Pfoser testified that the third component of DNA paternity testing converted the paternity index into a probability of paternity percentage using a statistical, mathematical formula called “Bayes' Theorem.” Pfoser explained:
“Bayes' Theorem is essentially a basis for a likelihood ratio. Like I kind of described before, you're basing it on two conflicting hypotheses or two conflicting assumptions. One is that the individual in question is in fact the father as opposed to a completely random unrelated individual could be the father.”
Pfoser further explained:
“[S]o you're taking two, essentially two, calculations, one calculation is * * * the prior probability or the assumed probability that the person in question is the father of the child and that is divided by the probability that some unrelated person within the same race group in the general population is the father of the child.”
And     "Pfonis testified that the prior probability of paternity was set at 50%.”

As the earlier remarks on Bayes' rule indicate, this explanation of Bayes' rule cries out for corrections at every turn, but now I will just focus on the last sentence because the concept of a prior probability is what triggers worries about the presumption of innocence. The idea of a prior probability is intuitive but not easily mapped onto the legal setting. If I want to infer whether a furry animal that I glimpsed running outside my window is a squirrel (as opposed to a groundhog, a rabbit, a chipmunk, a possum, a cat, a skunk, a bear, or any other furry critter in this neck of the woods), I can start by asking how often various creatures go by. Based on my past observations, I would order the possibilities as squirrel, chipmunk, rabbit, groundhog, cat, and so on. If squirrels account for half of the past sightings, I might select 50% for the probability of a squirrel. This is my prior probability.

Now I think about the details of what I saw in the periphery of my vision. How small was it? What color? Did it seem to have short legs? Was it scurrying or hopping? To the extent that the set of characteristics I was able to discern are more probable for squirrels than for other creatures, I should adjust my probability upwards to arrive at my posterior probability.

Bayes’ rule is a prescription for making the adjustment. It instructs us to multiply the prior odds by the likelihood ratio. Then, voilà, the posterior odds emerge. Suppose my likelihood ratio is 3. I think the characteristics I perceived are 3 times as probable when a squirrel zips by than when the average non-squirrel does. 3/ If the prior probability is ½, the prior odds are 1 to 1, and the posterior odds are 3 × 1:1 = 3:1. Odds of 3:1 correspond to a posterior probability of 3/(3+1) = 3/4. Following Bayes’ rule, I moved from a prior probability of ½ to a posterior of 3/4.

The expert in Gonis arrived at his posterior probability of paternity by making up a set of prior odds — he chose 1:1 — for defendant’s paternity and multiplying them by the paternity index. This looks like a Bayesian calculation. 4/ But in the squirrel-sighting case, there was an empirical basis for the prior odds. I know something about the animals in my neighborhood. The DNA technical leader apparently offered no such justification for his choice of the same number. And how could he? His expertise does not extend to the sexual and criminal conduct of the defendant and everyone else in the male population. The judge or jury, not the DNA profiling expert, is supposed to consider the nongenetic evidence in the case and to rely on its general background information in processing the totality of the evidence in the case to reach its best verdict.

In Gonis, trial judge, who was the factfinder in the case, was explicit about why he found the prior probability of ½ to be acceptable:
The court noted that the cases cited by the State explained why “the .5 number presumption that they start off with is actually just a truly neutral number. It assumes the same likelihood that the defendant was not the father of the child as it does that he would be the father of the child.
This rationale is specious. For a Bayesian, starting with a probability of ½ amounts to believing, before learning about the DNA profiles, that the defendant owns half the probability and that the other half is distributed across every else in the population. Maybe the other evidence in the case would justify that belief, but it hardly seems “neutral” toward the defendant. It treats him very differently from every other man in the population. The more “neutral” position might be to assign the same per capita probability to everyone, including the defendant, and then make adjustments according to the specifics of the case.

The appellate court took no stand on whether the trial court’s conception of neutrality was scientifically or legally tenable. Construing the defendant’s objection narrowly, the court did "not reach the issue of whether a 50% prior probability is a neutral number."

A bona fide Bayesian procedure would be to display the posterior probability for many values of the prior probability. This “variable prior odds approach” avoids the need for the expert to tell the judge or jury which prior probability is correct. 5/

That said, the uncontested likelihood ratios in Gonis, as Justice Schmidt observed, would swamp most prior probabilities. Even if we regarded all the men in the Chicago metropolitan area as equally likely, a priori, to have fathered the two children, the posterior odds of paternity still would be substantial. There are fewer than five million men (of all ages) living in the metropolitan area. So the per capita prior odds are 1:5 million. For the likelihood ratios of 195 million and 26 million, the posterior odds would be more than 39:1 for the paternity of J.G. and 5:1 for the paternity of A.G.

NOTES
  1. 2018 IL App (3d) 160166, No. 3-16-0166, 2018 WL 6582850 (Ill. App. Ct. Dec. 13, 2018).
  2. Particularly at a single locus, an exclusion does not mean that the probability of paternity is strictly zero. Mutations at some of the STR loci are known to occur at nonzero rates.
  3. The phrasing about an "average non-squirrel" is imprecise. There are n+1 mutually exclusive hypotheses H0, H1, H2, ..., Hn, about the animal. Each Hj has a prior probability Pr(Hj) and a likelihood Pr(E|Hj). Let H0 be the squirrel hypothesis. The appropriate factor for the multiplication of the prior odds is the squirrel likelihood Pr(E|H0) divided by a weighted average of the other likelihoods. The weight for each non-squirrel hypothesis Hj (j = 1, .., n) is my prior probability on that hypothesis renormalized to reflect that it is conditional on ~H0. In other words, the Bayes factor is Pr(E|H0) × [1−Pr(H0)] divided by Pr(H1) × Pr(E|H1) + ... + Pr(Hn) × Pr(E|Hn).
  4. By limiting attention to an unrelated man as the only possible alternative, the technical leader was ignoring the terms in the denominator of the Bayes factor for possible related men. See supra note 3. As a result, the Bayesian interpretation he provided was not strictly correct.
  5. For discussions of such proposals and their reception in court and in the scholarly literature, see David H. Kaye, David E. Bernstein & Jennifer Mnookin, The New Wigmore on Evidence: Expert Evidence ch. 15 (2d ed. 2011) (updated annually).
Last updated: 16 May 2019, 1:20 PM

Sunday, May 5, 2019

State v. Sharpe: What If Other Forensic Science Methods Were Given the Same Scrutiny as Polygraph Evidence?

Earlier this year, the Alaska Supreme Court adopted the majority rule excluding polygraph evidence. That outcome is not surprising, but how the court reached this result merits attention. The court's careful opinion varies from the insightful to the misconceived. If some of the same reasoning were applied to other parts of forensic science, judicial opinions would improve. But one part of the court's analysis of "error rates" cannot be reconciled with Daubert and reproduces an error exposed in the legal and statistical literature over thirty years ago.

Chief Justice Craig Stowers' analysis for a unanimous court begins with the somewhat technical legal issue of the standard of review on appeal. Does the appellate court have to defer to the trial judge's determination of whether the evidence constitutes "scientific knowledge" within the meaning of Daubert v. Merrell Dow Pharmaceuticals, 509 U.S. 579 (1993), unless that determination is an "abuse of discretion"? Or does the appellate court review the record and literature for itself in a "de novo" review? Before the several cases decided along with State v. Sharpe, 435 P.3d 887 (Alas. 2019), Alaska, like the federal courts, used the former standard.

In Sharpe, however, the court overruled State v. Coon, 974 P.2d 386 (Alaska 1999), to adopt the minority rule of de novo review. I think that is the right result. For one thing (that the court does not discuss), the demeanor of the expert witnesses in a pretrial  hearing on the state of the science is less important than the expert's articulated reasoning and the pertinent studies. The latter can be assessed almost as well on a cold record as they can be after listening the witnesses.

Testing the Technique

Applying the de novo standard, the Sharpe opinion moves through the usual Daubert factors. To begin with, it concludes that the testing of "the psychological hypotheses that serve as the underlying premise of polygraph testing" is insufficient and that some of them "may not be readily testable." The problem here seems to be that it is hard to know from low-stakes experiments whether "a truthful person will respond more strongly to the comparison questions [and] a deceptive person will have a stronger reaction to the relevant questions," while "field studies have difficulties establishing the 'ground truth' of whether an examined person was actually lying." Hence, "this factor weighs decidedly against admitting polygraph testimony as scientific evidence."

The court did not apply so exacting an analysis in Coon. There, it upheld a determination that voice spectrographic identification of speakers was scientifically valid without discussing if or how the physiological assumptions of that technique had been tested. In Sharpe, the court observed that "a 2003 review of the scientific evidence on polygraphy by the National Research Council concluded that '[p]olygraph research has not developed and tested theories of the underlying factors that produce the observed responses.'" In Coon, it ignored a 1979 NRC report that stated that spectrographic voice identification "lacks a solid theoretical basis" and that its most crucial assumption had not been adequately tested. In Sharpe, the court agonized over the limited ability of laboratory studies to replicate real-world conditions. In Coon, it paid no attention to the difficulties in simulating factors of ambient noise, other sounds, transmission channels, and mismatched recording conditions.

Peer Review and Publication

The Sharpe court gave "little weight" to the existence of a substantial body of peer-reviewed publications on polygraphy. Considering the tendency of some proponents of criminalistics methods to provide long lists of publications as if the sheer number and age of the writings prove scientific validity, this part of the opinion is refreshing. The court explained that "the mere fact of publication in a peer-reviewed journal is not itself probative of a technique’s validity." "Most of the studies cited by Dr. Raskin in support of the technique are from the 1980s and 1990s, with some dated as far back as the late 1970s." "Thus, although studies regarding CQT polygraphy have been published in peer-reviewed journals, it does not appear that this has resulted in the kind of refinement and development that makes publication and peer review relevant to a Daubert analysis."

Error Rates

The court's analysis of error rates is less perceptive. It begins as follows:
[T]he studies cited by Dr. Raskin showed an accuracy rate of 89% to 98%, while those cited by Dr. Iacono had accuracy rates from 51% to 98%, with an average of 71%. Dr. Raskin estimated that the overall accuracy rate of CQT polygraph testing was around 90%. 
Dr. David Raskin, a professor emeritus of psychology at the University of Utah, who testified in support of the validity of polygraph procedure, is well aware that it takes two probabilities or statistics--sensitivity and specificity--to define the accuracy of a test with a yes-or-no outcome. Dr. William Iocono, a psychology professor at the University of Minnesota, who testified for the state, also knows this. Sensitivity is the probability of a positive result (here, a finding that the subject is consciously lying) given that the condition (conscious deception) is really present. It can be abbreviated as P(+ | D). Specificity is the probability of a negative result (here, a finding that the subject is not consciously lying) given that the condition is not present: P(– | ~D). A highly accurate test is both very sensitive and very specific. When confronted with conscious deception, the examiner almost always detects it (high sensitivity); when confronted with truthful responses, the examiner rarely diagnoses deception (high specificity). High sensitivity corresponds to a small false-negative error probability (because P(– | D) + P(+ | D) = 1); high specificity corresponds to a low false-positive probability (because P(+ | ~D) + P(– | ~D) = 1).

I am not sure what "the overall accuracy rate" means here, but to try to unpack the court's reasoning, I am going to assume that the best studies established the figure of "around 90%" for both sensitivity and specificity. It follows that both the false-negative and the false-positive error rate are around 10%. Are those error probabilities so high that they counsel against admission under Daubert? I would argue that they are sufficient for "evidentiary reliability" as defined in Daubert -- if the evidence can be presented so that they jury gives the polygraph findings the limited weight they deserve. Some lawyers and scientists would disagree and say that higher accuracy than "about 90%" is necessary. Statistically, the best way to express the lawyer's concept of probative value of a binary test finding is with the likelihood ratio L = P(+ | D) / P(– | D) for a positive finding or L = P(– | ~D) / P(+ | ~D) for a negative finding. In Sharpe and its companion cases, the findings were negative -- no deception -- with L = 90% / 10% = 9. In other words, the report of no deception was nine times more probable when the subject is truthful than when the subject is lying. A diagnosing physician might want to order a test for cancer that is this discerning, even though it would be far from conclusive.

Rather than conclude that 90% accuracy is mildly supportive of validity, the Sharpe court took a different tack. First, it pointed to Dr. Iocono's criticisms that the laboratory experiments lacked realism and that the field studies suffered from selection bias and inadequate knowledge of "ground truth." Those are important points. If the studies do not apply to criminal cases or do no prove what they are supposed, then who cares about the numbers they generate? To that extent, the court is again saying that the method has not been adequately tested and is difficult to test.

However, the court's discussion of error rates did not stop here. The opinion muddied the waters by bringing up "base rates" as a necessary component of probative value. The opinion reads:
[T]he empirical basis for polygraph examinations suffers from another fault: the lack of a reliable “base rate.” In the three cases currently before this court, each defendant was said to have passed his polygraph test; the relevant question for the factfinder is whether, given this fact, the defendant was likely truthful or whether the test was a false negative. To determine this likelihood, more information is required; specifically, information about the base rate of deceptive and truthful subjects.
The lack of a reliable base rate estimate was the underlying reason for the Connecticut Supreme Court upholding its traditional per se ban on admitting polygraph evidence in State v. Porter. Noting “wide disagreement” about the accuracy rates for “a well run polygraph exam,” the court decided that, even if the estimates of polygraph proponents were accepted, the technique would still be “of questionable validity.” ... The court ... reasoned that, even if a test is accurate, its probative value as scientific evidence depends on its “predictive value”—the likelihood “that a person really is lying given that the polygraph labels the subject as deceptive” and the likelihood “that a subject really is truthful given that the polygraph labels the subject as not deceptive.” This predictive value, the court explained, depends not only on the accuracy of the test but also “on the ‘base rate’ of deceptiveness among the people tested by the polygraph.” Because the Porter court found a “complete absence of reliable data on base rates,” it concluded that it had no possible way of assessing the test’s probative value. With that in mind, the court concluded that even if polygraph evidence satisfies the Daubert standard, which it assumed without deciding, the probative value of such evidence is very low and substantially outweighed by its prejudicial effects.

As in Porter, the record before us is devoid of reliable data about the base rate of deceptiveness among polygraph examinees outside of lab tests; we also have not found such data in academic literature. Absent some reliable estimate of this base rate there is no way to estimate the reliability of polygraph results, and thus no way to determine whether any particular accuracy rate is acceptable. We conclude that the superior court clearly erred in finding the error rate of CQT polygraph testing to be “sufficiently reliable.” Accordingly, this factor weighs against admitting polygraph evidence.
If the error-rate factor of Daubert "weighs against admitting ... evidence" unless there is a “reliable estimate of the base rate,” then back in Coons, the Alaska Supreme Court was wrong to rely on claims of small error rates to uphold the admission of voice spectrographic identification. There was no testimony, let alone scientific knowledge, of the "base rate" of matching spectrographs in the relevant suspect population. That also was true of the case the Supreme Court cited when it invoked "error rates" as a factor in Daubert. United States v. Smith, 869 F. 2d 348, 353-354 (7the Cir. 1989), listed studies such as one in which "the error rate for false identifications was 2.4% and the error rate for false eliminations was about 6%." It did not mention "base rates" or "predictive value" -- terms that are defined the box below:
Terminology for Accuracy and Probative Value of Tests that Classify Things into Two Categories

Operating Characteristics (How accurate is the test itself?)
Sensitivity P(+ | D), probability of a positive finding (e.g., "the suspect is lying") given that the condition (e.g., conscious deception) is present
False negative probability P(– | D) = 1 - P(+ | D) = 1 - sensitivity
Specificity P(– | ~D), probability of a negative finding (e.g., "the subject is not lying") given that the condition is not present
False positive probability P(+ | ~D) = 1 – P(– | ~D) = 1 – specificity

Efficacy (How dispositive are the test findings?)
Prevalence or base rate F(D), relative frequency of the condition in the group being tested
Prior odds Odds(D), odds of the condition in an individual being tested
Positive predictive value PPV = P(D | +), probability of the condition given the positive test finding
Negative predictive value: NPV = P(~D | –), probability of the absence of the condition given the negative test finding
Posterior odds Odds(D | +) or Odds(D| –), odds of the condition given the test finding

Probative value (How much relative support does the result provide?)
Likelihood ratio
L (How many times more probable is the test result for the different possibilities?)
● For a positive finding, Lpos = P(+ | D) / P(– | D)
● For a negative finding, Lneg = P(– | ~D) / P(+ | ~D)

Bayes rule (How much does the test finding change the prior odds?)
● For a positive finding, Odds(D | +) = Lpos × Odds(D)
● For a negative finding, Odds(~D | -) = Lneg × Odds(~D)
The opinion in Sharpe has confused probative value -- the extent to which evidence tends to prove the proposition that it is offered to prove -- with the probability that the proposition is true. The latter is surely what the jury wants to know, but it gets to that probability by considering all the evidence that supports or undermines the proposition in question. The likelihood ratio (rather than the "predictive value") for an item of evidence expresses its probative value. The error-rate factor in Daubert requires courts to ask whether false-positive and false-negative error probabilities are so large that the test has too little probative value to justify its admission as scientific evidence.

Scientific evidence need not be conclusive to be valid and admissible. If only 1 out of 91 polygraphed people would lie -- that is the base rate -- and if no other evidence that the defendant would lie to the polygrapher were available, then the prior odds that the defendant was truthful arguably would be 1 to 90. The posterior odds then would still be low—namely, 9 to 90, for a “predictive value” or posterior probability of only 9/99 = 1/11. On the other hand, if the base rate and the prior odds were higher, say 1/2 and 1 to 1, respectively, then the predictive value and posterior probability would be 9/10. But in both cases, the finding is probative and worth knowing.

In sum, whether the base rate of lying among criminal suspects in general is high or low does not alter the extent to which the evidence tends to prove that the suspect in a particular case is or is not lying. The odds of lying change in the same ratio. A test that produces results that are strongly indicative of the presence or absence of a condition—compared to what was known beforehand—is a valid classifier regardless of the base rate for the condition in some population (or the prior probability in the case at bar).

Controlling Standards

Daubert spoke of “the existence and maintenance of standards controlling the technique’s operation.” Courts tend to cite any kind of a standard (such as one prescribing the educational qualifications of a practitioner) as if it controls how the test is to be performed. The Sharpe court noted that "many states ... have statutes governing polygraph test administration, examinees’ privacy rights, and licensing of examiners," but it also pointed out that "the formulation and ordering of questions, the conducting of the pretest interview, the choice of scoring system, and the evaluation of the examinee’s demeanor leave much to the examiner’s discretion." Consequently, it concluded that "the lack of clear controlling standards for CQT administration weighs against its admissibility."

General Acceptance

Among other things, the court wrote that in light of "the apparently lackluster support for the technique outside the community of practicing polygraph examiners, we conclude that this factor also weighs against admitting polygraph evidence." In contrast, when "outside" bodies review identification methods in forensic science, practitioners invariably complain (if the reviews are unflattering) that they were not adequately represented in the process.

Financial Interest

The factors enumerated in Daubert are not exhaustive. Going one step beyond them, the Sharpe court expressed concern over “the danger of a hidden litigation motive” behind research. It cautioned that "[m]any of the studies cited as approving polygraph testing as scientifically valid were performed by ... practicing examiners, and a number of the studies were published in polygraph industry publications." This, too, has implications for much of the research in other areas of forensic science.

FURTHER READING