Wednesday, September 26, 2012

True Lies: fMRI Evidence in United States v. Semrau

This month, the U.S. Court of Appeals for the Sixth Circuit issued an opinion on “a matter of first impression in any jurisdiction.” The case is United States v. Semrau, No. 11-5396, 2012 WL 3871357 (6th Cir. Sept. 7, 2012). Its subject is the admissibility of the latest twist, the ne plus ultra, in lie detection—functional magnetic resonance imaging (fMRI).

In several ways, the case resembles what may well be the single most cited case on scientific evidence—namely, Frye v. United States, 293 F. 1013 (D.C. Cir. 1923). Frye instituted a special test for admitting scientific evidence. In Frye, a defense lawyer asked a psychologist, Dr. William Moulton Marston, who had developed and published studies of a systolic blood pressure test for conscious deception, to examine a young man accused of murdering a prominent physician. Dr. Marston came to Washington and was prepared to testify that the accused was truthful in retracting his confession to the murder. The trial court would not hear of it. The jury convicted. The defendant appealed. In a short opinion pregnant with implications, the Court of Appeals for the District of Columbia affirmed the exclusion of the expert’s opinion that the defendant was not lying to him.

In United States v. Semrau, defense counsel invited Dr. Steven Laken to examine the owner and CEO of two firms accused of criminal fraud in billing Medicare and Medicaid for psychiatric services that the firm supplied in nursing homes. Like Marston, Dr. Laken, had invented and published on an impressive method of lie detection. Following three sessions with the defendant, Dr. Laken concluded that the accused “was generally truthful as to all of his answers collectively.” As in Frye, the district court excluded such testimony. As in Frye, a jury convicted. As in Frye, the defendant appealed. As in Frye, the court of appeals affirmed.

Dr. Marston held degrees from Harvard in law and in psychology. He worked hard to develop and popularize psychological theories (and he created the comic book character, Wonder Woman). Like Marston, Dr. Laken is highly creative, productive, and enterprising. Dr. Laken started his scientific career in genetics and cellular and molecular medicine. He achieved early fame for discovering a genetic marker and developing a screening test for an elevated risk of a form of colon cancer. For that accomplishment, MIT’s Technology Review recognized him as one of the most important 35 innovators under the age of 35 and noted that “Laken believes his methods could spot virtually any illness with a genetic component, from asthma to heart disease.” I do not know if that happened. After four years as Director of Business Development and Intellectual Asset Management at Exact Sciences, a “molecular diagnostics company focused on colorectal cancer,” Laken left genetic science to found Cephos, “the world-class leader in providing fMRI lie detection, and in bringing fMRI technology to commercialization.”1/

Despite these parallels, Laken is not Marston, and Semrau is not Frye. For one thing, in Frye, the trial judge excluded the evidence without an explanation. In Semrau, the trial judge had a magistrate conduct a two-day hearing. Two highly qualified experts called by the government challenged the validity of Dr. Laken’s theories, and the magistrate judge wrote a 43-page report recommending exclusion of the fMRI testimony from the trial.2/

Furthermore, in Frye, there was no previous body of law imposing a demanding standard on the proponents of scientific evidence—the Frye court created from whole cloth the influential “general acceptance” test.3/ In Semrau, the court began with the Federal Rules of Evidence, ornately embroidered with the Supreme Court's opinions in Daubert v. Merrell Dow Pharmaceuticals and two related cases and with innumerable lower court opinions applying the Daubert trilogy. This legal tapestry requires a showing of “reliability” rather than “general acceptance,” and it usually involves attending to four or five factors relating to scientific validity enumerated in Daubert.3/ I want to look briefly at a few of these in the context of Semrau.

* * *

Even though the only judges to address fMRI-based lie detection (those in Semrau) have deemed it inadmissible under both the Daubert standard (and under the Frye criterion of general acceptance), Cephos continues to advise potential clients that “[t]he minimum requirements for admissibility of scientific evidence under the U.S. Supreme Court ruling Daubert v. Merrell Dow Pharmaceuticals, are likely met.” One can only wonder whether its “legal advisors,” such as Dr. Henry Lee (see note 1), are comfortable with Cephos’s reasoning that
According to a PubMed search, using the keywords ”fMRI” or “functional magnetic resonance imaging” yields over 15,000 fMRI publications. Therefore, the technique from which the conclusions are drawn is undoubtedly generally accepted.
The reasoning is peculiar, or at least incomplete. The sphygmomanometer that Dr. Marston used also was “undoubtedly generally accepted.” This pressure meter was invented in 1881, improved in 1896, and modernized in 1901, when Harvey Cushing popularized the device in the medical community. However, the acknowledged ability to measure systolic blood pressure reliably and accurately does not validate the theory—which predated Marston—that blood pressure is a reliable and valid indicator of conscious deception. Likewise, the number of publications about fMRI in general—and even particular evidence that it is a wonderful instrument with which to measure blood oxygenation levels in parts of the brain—reveals very little about the validity of the theory that these levels are well correlated with conscious deception. To be sure, there is more research on this association than there was on the blood pressure theory in Frye, but the Semrau courts were not overly impressed with applicability of the experimentation to the examination conducted in the case before it.4/

* * *

In addition to directing attention to general acceptance, Daubert v. Merrell Dow Pharmaceuticals identifies “the known or potential rate of error in using a particular scientific technique” as a factor to consider in determining “evidentiary reliability.” The Daubert Court took this factor from circuit court cases involving polygraphy and “voiceprints.” Unfortunately, the ascertainment of meaningful error rates has long confused the courts,5/ and the statistics in Semrau are not presented as clearly as one might hope.

According to Cephos, “[p]eer review results support high accuracy,” but this short statement begs vital questions. Accuracy under what conditions? How “high” is it? Higher for diagnoses of conscious deception than for diagnoses of truthfulness, or vice versa? The court of appeals began its description Semrau’s evidence on this score as follows:
Based on these studies, as well as studies conducted by other researchers, Dr. Laken and his colleagues determined the regions of the brain most consistently activated by deception and claimed in several peer-reviewed articles that by analyzing a subject's brain activity, they were able to identify deception with a high level of accuracy. During direct examination at the Daubert hearing, Dr. Laken reported these studies found accuracy rates between eighty-six percent and ninety-seven percent. During cross-examination, however, Dr. Laken conceded that his 2009 “Mock Sabotage Crime” study produced an “unexpected” accuracy decrease to a rate of seventy-one percent. ...
But precisely what do these “accuracy rates” measure? By “identify deception,” does the court mean that 71%, 86%, and 97% are the proportions of subjects who were diagnosed as deceptive out of those whom the experimenters asked to lie? If we denote a diagnosis of deception as a “positive” finding (like testing positive for a disease), then such numbers are observed values for the sensitivity of the test. They indicate the probability that given a lie, the fMRI test will detect it—in symbols, P(diagnose liar | liar), where “|” means “given.” The corresponding conditional error probability is the false negative probability P(diagnose truthful | liar) = 1 – sensitivity. It is the probability of missing the act of lying when there is a lie.

So far so good. But it takes two probabilities to characterize the accuracy of a diagnostic test. The other conditional probability is known as specificity. Specificity is the probability of a negative result when the condition is not present. In symbols that apply here, the specificity is P(diagnose truthful | truthful). Its complement, 1 – specificity, is the false positive, or false alarm, probability, P(diagnose liar | truthful). That is, the false alarm probability is the probability of diagnosing the condition as present (the subject is lying) when it is absent (the subject actually is not lying). What might the specificity be? According to the court,
Dr. Laken testified that fMRI lie detection has “a huge false positive problem” in which people who are telling the truth are deemed to be lying around sixty to seventy percent of the time. One 2009 study was able to identify a “truth teller as a truth teller” just six percent of the time, meaning that about “nineteen out of twenty people that were telling the truth we would call liars.” . . .
Why was this not a problem for Dr. Laken in this case? Well, the fact that the technique has a high false positive error probability (that it classifies most truthful subjects as liars) does not mean that it also has a high false negative probability (that it classifies most lying subjects as truthful). Dr. Laken conceded that the false positive probability, P(diagnose liar | truthful), is large (around 0.65, from the paragraph quoted immediately above). Indeed the reference to 6% accuracy for classifying liars (the technique’s sensitivity to lying), corresponds to a false positive probability of 100% – 6% = 0.94. The average figure for this false alarm probability, according to Dr. Laken’s statements in the preceding quoted paragraph, is lower, but it is still a whopping 0.65. Nevertheless, if the phrase “accuracy rates” in the first quoted paragraph refers to specificity, then the estimates of specificity that he provided are respectable. The average of 0.71, 0.86, and 0.97 is 0.85.

What do these numbers prove? One answer is that they apply only under the conditions of the experiments and only to subjects of the type tested in these experiments. The opinions take this strict view of the data, pointing out that the experimental subjects were younger than Semrau and that they faced low penalties for lying. Indeed, the court explained that
Dr. Peter Imrey, a statistician, testified: “There are no quantifiable error rates that are usable in this context. The error rates [Dr. Laken] proposed are based on almost no data, and under circumstances [that] do not apply to the real world [or] to the examinations of Dr. Semrau.”
These remarks go largely to the Daubert question. If the experiments are of little value in estimating an error rate in populations that would be encountered in practice, then the validity of the technique is difficult to gauge, and Cephos’s assurance that this factor weighs in favor of admissibility is vacuous. If there is no way to estimate the conditional error probability for the examination of Semrau, then it is hard to conclude that the test has been validated for its use in the case.

* * *

Fair enough, but I want to go beyond this easy answer. Psychologists often are willing to generalize from laboratory conditions to the real world and from young subjects (usually psychology students) to members of the general public. So let us indulge, at least arguendo, the heroic assumption that the ballpark figures for the specificity and the false alarm probability apply to defendants asserting innocence in cases like Semrau. On this assumption, how useful is the test?

Judging from the experiments as described in the court of appeals opinion, if Semrau is truthful in denying any intent to defraud, there is roughly a 0.85 probability of detecting it, and if he lies, there is maybe a 0.65 probability of misdiagnosing him as truthful. So the evidence—the diagnosis of truthfulness—is not much more probable when he is truthful than when he is lying. As such, the fMRI diagnosis of truthfulness has little probative value. (The likelihood ratio is .85/.65 = 1.3.)

That a diagnosis of deception is almost as probable for truthful subjects as for mendacious ones bears mightily on the Rule 403 balancing of prejudice against probative value. The court held that this balancing justified exclusion of Dr. Laken’s testimony, largely for reasons that I won’t go into.6/ It referred to questions about “reliability” in general, but it did not use the error probabilities to shed a more focused light on the probative value of the evidence.

However, it seems from the opinion that Dr. Laken offered at least one probability to show that his diagnosis was correct. The court noted that
Dr. Imrey also stated that the false positive accuracy data reported by Dr. Laken does not “justify the claim that somebody giving a positive test result ... [h]as a six percent chance of being a true liar. That simply is mathematically, statistically and scientifically incorrect.”
It is hard to understand what the “six percent chance” for “somebody giving a positive test result” had to do with the negative diagnosis (not lying) for Semrau. A jury provided with negative fMRI evidence (“He was not lying”) must decide whether the result is a true negative or a false negative—not what might have happened had there been a positive diagnosis.

As for the 6% solution, it is impossible to know from the opinion how Dr. Laken arrived at such a number for the probability that a subject is lying given a positive diagnosis. The conditional probabilities from the experiments run in the opposite direction. They address the probability of evidence (a diagnosis) given an unknown state of the world (a liar or a truthful subject). If Dr, Laken really opined on the probability of the state of the world (a liar) given the fMRI signals, then he either was naively transposing a conditional probability—a no-no discussed many times in this blog—or he was using Bayes’ rule. In light of Dr. Imrey’s impeccable credentials as a biostatistician and his unqualified dismissal of the number as “mathematically, statistically and scientifically incorrect,” I would not bet on the latter explanation.

Notes

1. If the firm’s website is any indication, it is not an equivalent leader in good grammar. Apparently seeking the attention of wayward lawyers, it advertises that “[i]f you or your client professes their innocence, we may provide pro bono consulting.” The website also offers intriguing reasons to believe in the company’s prowess: it is “represented by one of the top ten intellectual property law firms”; it has “been asked to present to the ... Sandra Day O’Connor Federal Courthouse”; and its legal advisors include Dr. Henry C. Lee (whose website includes “recent sightings of Dr. Lee.”). In addition to its lie-detection work, Cephos offers DNA testing, so perhaps I should not say that Dr. Laken has withdrawn entirely from genetic science.

2. The court of appeals buttressed its approval of the report with the observation that “Professor Owen Jones, who observed the hearing” and is on the faculties of law and biology at Vanderbilt University, stated in an interview with Wired, that the report was “carefully done.”

3. For elaboration, see David H. Kaye, David E. Bernstein & Jennifer L. Mnookin, The New Wigmore: A Treatise on Evidence—Expert Evidence (2d ed. 2011) http://www.aspenpublishers.com/product.asp?catalog_name=Aspen&product_id=0735593531

4. For a short discussion of validity in this context, see Francis X. Shen & Owen D. Jones, Brain Scans as Evidence: Truths, Proofs, Lies, and Lessons, 62 Mercer L. Rev. 861 (2011),

5. See David H. Kaye, David E. Bernstein & Jennifer L. Mnookin, The New Wigmore: A Treatise on Evidence—Expert Evidence (2d ed. 2011).

6. The court appeals wrote that “the district court did not abuse its discretion in excluding the fMRI evidence pursuant to Rule 403 in light of (1) the questions surrounding the reliability of fMRI lie detection tests in general and as performed on Dr. Semrau, (2) the failure to give the prosecution an opportunity to participate in the testing, and (3) the test result's inability to corroborate Dr. Semrau's answers as to the particular offenses for which he was charged.”

3 comments:

  1. Joe Cecil called my attention to this year's Ig
    Nobel Prize in Neuroscience, awarded to Craig Bennett, Abigail Baird, Michael Miller, and George Wolford [USA], "for demonstrating that brain researchers, by using complicated instruments and simple statistics, can see meaningful brain activity anywhere — even in a dead salmon." Yes, it's true (see http://www.improbable.com/ig/winners/), and the paper makes a serious point. See "Neural Correlates of Interspecies Perspective Taking in the
    Post-Mortem Atlantic Salmon: An Argument For Multiple Comparisons Correction," Journal of Serendipitous and Unexpected Results, vol. 1,
    no. 1, 2010, pp. 1-5. The work is well known in the relevant scientific community, and the backdrop to the research is described at http://prefrontal.org/blog/2009/09/the-story-behind-the-atlantic-salmon/.

    ReplyDelete
  2. This is a great post. Could you say more about the “heroic assumption[s]” of generalizing laboratory error rates to real-world applications? It seems to me that ascertainment of error rates (and at least the Court implicitly acknowledged there are two types of errors!) virtually necessitates the use of experimentation, which will never yield a 1 to 1 correspondence with any real-world application. Paul Meehl noted that this argument is often used to dismiss statistical predictions, despite the fact that statistical predictions generally outperform predictions made by other methods. And this seems to be a common argument made by courts when rejecting scientific evidence or error rates. Generalizability is, of course, an empirical question, but I fear that courts, expecting perfect correspondence, will set a bar that is impossibly high. Thoughts?

    ReplyDelete
    Replies
    1. External validity is always an issue. Perfect correspondence between the conditions of an experiment and the circumstances of a case is unattainable. It may take a variety of experiments to identify the important factors that affect the performance of the system. At some point -- the exact location of which will be subject to reasonable argument -- confidence in the results of the technique as applied in the circumstances of a particular case will be justified. How high to set the bar should depend on the nature of the evidence and its potential to assist or mislead the factfinder. I know that is vague, but it is hard to be more specific.

      Delete