Sunday, May 5, 2019

State v. Sharpe: What If Other Forensic Science Methods Were Given the Same Scrutiny as Polygraph Evidence?

Earlier this year, the Alaska Supreme Court adopted the majority rule excluding polygraph evidence. That outcome is not surprising, but how the court reached this result merits attention. The court's careful opinion varies from the insightful to the misconceived. If some of the same reasoning were applied to other parts of forensic science, judicial opinions would improve. But one part of the court's analysis of "error rates" cannot be reconciled with Daubert and reproduces an error exposed in the legal and statistical literature over thirty years ago.

Chief Justice Craig Stowers' analysis for a unanimous court begins with the somewhat technical legal issue of the standard of review on appeal. Does the appellate court have to defer to the trial judge's determination of whether the evidence constitutes "scientific knowledge" within the meaning of Daubert v. Merrell Dow Pharmaceuticals, 509 U.S. 579 (1993), unless that determination is an "abuse of discretion"? Or does the appellate court review the record and literature for itself in a "de novo" review? Before the several cases decided along with State v. Sharpe, 435 P.3d 887 (Alas. 2019), Alaska, like the federal courts, used the former standard.

In Sharpe, however, the court overruled State v. Coon, 974 P.2d 386 (Alaska 1999), to adopt the minority rule of de novo review. I think that is the right result. For one thing (that the court does not discuss), the demeanor of the expert witnesses in a pretrial  hearing on the state of the science is less important than the expert's articulated reasoning and the pertinent studies. The latter can be assessed almost as well on a cold record as they can be after listening the witnesses.

Testing the Technique

Applying the de novo standard, the Sharpe opinion moves through the usual Daubert factors. To begin with, it concludes that the testing of "the psychological hypotheses that serve as the underlying premise of polygraph testing" is insufficient and that some of them "may not be readily testable." The problem here seems to be that it is hard to know from low-stakes experiments whether "a truthful person will respond more strongly to the comparison questions [and] a deceptive person will have a stronger reaction to the relevant questions," while "field studies have difficulties establishing the 'ground truth' of whether an examined person was actually lying." Hence, "this factor weighs decidedly against admitting polygraph testimony as scientific evidence."

The court did not apply so exacting an analysis in Coon. There, it upheld a determination that voice spectrographic identification of speakers was scientifically valid without discussing if or how the physiological assumptions of that technique had been tested. In Sharpe, the court observed that "a 2003 review of the scientific evidence on polygraphy by the National Research Council concluded that '[p]olygraph research has not developed and tested theories of the underlying factors that produce the observed responses.'" In Coon, it ignored a 1979 NRC report that stated that spectrographic voice identification "lacks a solid theoretical basis" and that its most crucial assumption had not been adequately tested. In Sharpe, the court agonized over the limited ability of laboratory studies to replicate real-world conditions. In Coon, it paid no attention to the difficulties in simulating factors of ambient noise, other sounds, transmission channels, and mismatched recording conditions.

Peer Review and Publication

The Sharpe court gave "little weight" to the existence of a substantial body of peer-reviewed publications on polygraphy. Considering the tendency of some proponents of criminalistics methods to provide long lists of publications as if the sheer number and age of the writings prove scientific validity, this part of the opinion is refreshing. The court explained that "the mere fact of publication in a peer-reviewed journal is not itself probative of a technique’s validity." "Most of the studies cited by Dr. Raskin in support of the technique are from the 1980s and 1990s, with some dated as far back as the late 1970s." "Thus, although studies regarding CQT polygraphy have been published in peer-reviewed journals, it does not appear that this has resulted in the kind of refinement and development that makes publication and peer review relevant to a Daubert analysis."

Error Rates

The court's analysis of error rates is less perceptive. It begins as follows:
[T]he studies cited by Dr. Raskin showed an accuracy rate of 89% to 98%, while those cited by Dr. Iacono had accuracy rates from 51% to 98%, with an average of 71%. Dr. Raskin estimated that the overall accuracy rate of CQT polygraph testing was around 90%. 
Dr. David Raskin, a professor emeritus of psychology at the University of Utah, who testified in support of the validity of polygraph procedure, is well aware that it takes two probabilities or statistics--sensitivity and specificity--to define the accuracy of a test with a yes-or-no outcome. Dr. William Iocono, a psychology professor at the University of Minnesota, who testified for the state, also knows this. Sensitivity is the probability of a positive result (here, a finding that the subject is consciously lying) given that the condition (conscious deception) is really present. It can be abbreviated as P(+ | D). Specificity is the probability of a negative result (here, a finding that the subject is not consciously lying) given that the condition is not present: P(– | ~D). A highly accurate test is both very sensitive and very specific. When confronted with conscious deception, the examiner almost always detects it (high sensitivity); when confronted with truthful responses, the examiner rarely diagnoses deception (high specificity). High sensitivity corresponds to a small false-negative error probability (because P(– | D) + P(+ | D) = 1); high specificity corresponds to a low false-positive probability (because P(+ | ~D) + P(– | ~D) = 1).

I am not sure what "the overall accuracy rate" means here, but to try to unpack the court's reasoning, I am going to assume that the best studies established the figure of "around 90%" for both sensitivity and specificity. It follows that both the false-negative and the false-positive error rate are around 10%. Are those error probabilities so high that they counsel against admission under Daubert? I would argue that they are sufficient for "evidentiary reliability" as defined in Daubert -- if the evidence can be presented so that they jury gives the polygraph findings the limited weight they deserve. Some lawyers and scientists would disagree and say that higher accuracy than "about 90%" is necessary. Statistically, the best way to express the lawyer's concept of probative value of a binary test finding is with the likelihood ratio L = P(+ | D) / P(– | D) for a positive finding or L = P(– | ~D) / P(+ | ~D) for a negative finding. In Sharpe and its companion cases, the findings were negative -- no deception -- with L = 90% / 10% = 9. In other words, the report of no deception was nine times more probable when the subject is truthful than when the subject is lying. A diagnosing physician might want to order a test for cancer that is this discerning, even though it would be far from conclusive.

Rather than conclude that 90% accuracy is mildly supportive of validity, the Sharpe court took a different tack. First, it pointed to Dr. Iocono's criticisms that the laboratory experiments lacked realism and that the field studies suffered from selection bias and inadequate knowledge of "ground truth." Those are important points. If the studies do not apply to criminal cases or do no prove what they are supposed, then who cares about the numbers they generate? To that extent, the court is again saying that the method has not been adequately tested and is difficult to test.

However, the court's discussion of error rates did not stop here. The opinion muddied the waters by bringing up "base rates" as a necessary component of probative value. The opinion reads:
[T]he empirical basis for polygraph examinations suffers from another fault: the lack of a reliable “base rate.” In the three cases currently before this court, each defendant was said to have passed his polygraph test; the relevant question for the factfinder is whether, given this fact, the defendant was likely truthful or whether the test was a false negative. To determine this likelihood, more information is required; specifically, information about the base rate of deceptive and truthful subjects.
The lack of a reliable base rate estimate was the underlying reason for the Connecticut Supreme Court upholding its traditional per se ban on admitting polygraph evidence in State v. Porter. Noting “wide disagreement” about the accuracy rates for “a well run polygraph exam,” the court decided that, even if the estimates of polygraph proponents were accepted, the technique would still be “of questionable validity.” ... The court ... reasoned that, even if a test is accurate, its probative value as scientific evidence depends on its “predictive value”—the likelihood “that a person really is lying given that the polygraph labels the subject as deceptive” and the likelihood “that a subject really is truthful given that the polygraph labels the subject as not deceptive.” This predictive value, the court explained, depends not only on the accuracy of the test but also “on the ‘base rate’ of deceptiveness among the people tested by the polygraph.” Because the Porter court found a “complete absence of reliable data on base rates,” it concluded that it had no possible way of assessing the test’s probative value. With that in mind, the court concluded that even if polygraph evidence satisfies the Daubert standard, which it assumed without deciding, the probative value of such evidence is very low and substantially outweighed by its prejudicial effects.

As in Porter, the record before us is devoid of reliable data about the base rate of deceptiveness among polygraph examinees outside of lab tests; we also have not found such data in academic literature. Absent some reliable estimate of this base rate there is no way to estimate the reliability of polygraph results, and thus no way to determine whether any particular accuracy rate is acceptable. We conclude that the superior court clearly erred in finding the error rate of CQT polygraph testing to be “sufficiently reliable.” Accordingly, this factor weighs against admitting polygraph evidence.
If the error-rate factor of Daubert "weighs against admitting ... evidence" unless there is a “reliable estimate of the base rate,” then back in Coons, the Alaska Supreme Court was wrong to rely on claims of small error rates to uphold the admission of voice spectrographic identification. There was no testimony, let alone scientific knowledge, of the "base rate" of matching spectrographs in the relevant suspect population. That also was true of the case the Supreme Court cited when it invoked "error rates" as a factor in Daubert. United States v. Smith, 869 F. 2d 348, 353-354 (7the Cir. 1989), listed studies such as one in which "the error rate for false identifications was 2.4% and the error rate for false eliminations was about 6%." It did not mention "base rates" or "predictive value" -- terms that are defined the box below:
Terminology for Accuracy and Probative Value of Tests that Classify Things into Two Categories

Operating Characteristics (How accurate is the test itself?)
Sensitivity P(+ | D), probability of a positive finding (e.g., "the suspect is lying") given that the condition (e.g., conscious deception) is present
False negative probability P(– | D) = 1 - P(+ | D) = 1 - sensitivity
Specificity P(– | ~D), probability of a negative finding (e.g., "the subject is not lying") given that the condition is not present
False positive probability P(+ | ~D) = 1 – P(– | ~D) = 1 – specificity

Efficacy (How dispositive are the test findings?)
Prevalence or base rate F(D), relative frequency of the condition in the group being tested
Prior odds Odds(D), odds of the condition in an individual being tested
Positive predictive value PPV = P(D | +), probability of the condition given the positive test finding
Negative predictive value: NPV = P(~D | –), probability of the absence of the condition given the negative test finding
Posterior odds Odds(D | +) or Odds(D| –), odds of the condition given the test finding

Probative value (How much relative support does the result provide?)
Likelihood ratio
L (How many times more probable is the test result for the different possibilities?)
● For a positive finding, Lpos = P(+ | D) / P(– | D)
● For a negative finding, Lneg = P(– | ~D) / P(+ | ~D)

Bayes rule (How much does the test finding change the prior odds?)
● For a positive finding, Odds(D | +) = Lpos × Odds(D)
● For a negative finding, Odds(~D | -) = Lneg × Odds(~D)
The opinion in Sharpe has confused probative value -- the extent to which evidence tends to prove the proposition that it is offered to prove -- with the probability that the proposition is true. The latter is surely what the jury wants to know, but it gets to that probability by considering all the evidence that supports or undermines the proposition in question. The likelihood ratio (rather than the "predictive value") for an item of evidence expresses its probative value. The error-rate factor in Daubert requires courts to ask whether false-positive and false-negative error probabilities are so large that the test has too little probative value to justify its admission as scientific evidence.

Scientific evidence need not be conclusive to be valid and admissible. If only 1 out of 91 polygraphed people would lie -- that is the base rate -- and if no other evidence that the defendant would lie to the polygrapher were available, then the prior odds that the defendant was truthful arguably would be 1 to 90. The posterior odds then would still be low—namely, 9 to 90, for a “predictive value” or posterior probability of only 9/99 = 1/11. On the other hand, if the base rate and the prior odds were higher, say 1/2 and 1 to 1, respectively, then the predictive value and posterior probability would be 9/10. But in both cases, the finding is probative and worth knowing.

In sum, whether the base rate of lying among criminal suspects in general is high or low does not alter the extent to which the evidence tends to prove that the suspect in a particular case is or is not lying. The odds of lying change in the same ratio. A test that produces results that are strongly indicative of the presence or absence of a condition—compared to what was known beforehand—is a valid classifier regardless of the base rate for the condition in some population (or the prior probability in the case at bar).

Controlling Standards

Daubert spoke of “the existence and maintenance of standards controlling the technique’s operation.” Courts tend to cite any kind of a standard (such as one prescribing the educational qualifications of a practitioner) as if it controls how the test is to be performed. The Sharpe court noted that "many states ... have statutes governing polygraph test administration, examinees’ privacy rights, and licensing of examiners," but it also pointed out that "the formulation and ordering of questions, the conducting of the pretest interview, the choice of scoring system, and the evaluation of the examinee’s demeanor leave much to the examiner’s discretion." Consequently, it concluded that "the lack of clear controlling standards for CQT administration weighs against its admissibility."

General Acceptance

Among other things, the court wrote that in light of "the apparently lackluster support for the technique outside the community of practicing polygraph examiners, we conclude that this factor also weighs against admitting polygraph evidence." In contrast, when "outside" bodies review identification methods in forensic science, practitioners invariably complain (if the reviews are unflattering) that they were not adequately represented in the process.

Financial Interest

The factors enumerated in Daubert are not exhaustive. Going one step beyond them, the Sharpe court expressed concern over “the danger of a hidden litigation motive” behind research. It cautioned that "[m]any of the studies cited as approving polygraph testing as scientifically valid were performed by ... practicing examiners, and a number of the studies were published in polygraph industry publications." This, too, has implications for much of the research in other areas of forensic science.

FURTHER READING

No comments:

Post a Comment