Voice Stress Analysis and the Investigation of Trayvon Martin's Death

Today’s New York Times has a lengthy article [1] on the flaws in the Sanford, Florida, police investigation of the shooting of black teenager, Trayvon Martin, by the 28-year-old neighborhood watch volunteer, George Zimmerman. When the 16-day investigation did not produce any charges, Florida’s governor responded to a national outcry by appointing an aggressive prosecutor. Now indicted, Zimmerman continues to maintain that he shot Martin in self-defense.

Tucked away at the end of the article is the single sentence: “The police conducted a lie-detection procedure, known as voice stress analysis, on Mr. Zimmerman that [sic] he passed.” A voice-stress test? By “a small city police department that does not even have a homicide unit and typically deals with three or four murder cases a year”? [1]

Yes, law enforcement agencies across the country have invested millions of dollars in voice stress analysis (VSA) software programs—despite a widely known lack of evidence to show that they work. For example, after conducting a field study of jail inmates using urinalysis to check their statements about whether they had used specific drugs, a University of Oklahoma researcher wrote that “two of the most popular VSA programs in use by police departments across the country are no better than flipping a coin when it comes to detecting deception regarding recent drug use [2]." 1/

The response of the National Institute for Truth Verification—the company that bills itself as “the world leader in voice stress analysis”—is instructive. The company’s website insists that
the vast majority of VSA studies funded by pro-polygraph elements of the US Government were significantly flawed. One of the many flaws of these studies . . . was that they lacked real-life consequence and thus lacked jeopardy. . . . [C]onsequence and jeopardy found in “high stakes lies” are required to accurately and consistently detect deception. [¶] . . . VSA research conducted by the University of Florida, and a second study conducted by researchers from the University of Oklahoma, both utilized “low stakes lies” in an attempt to measure the results of various VSA instruments. [3]
Another webpage on validity lists many studies, but the descriptions indicate that they merely demonstrate that VSA can detect stress and anxiety. [4] The question, as with the polygraph, is whether an examiner can ascertain the cause of the stress. Thus, the defense of VSA seems be this: we have no scientifically respectable body of proof showing that VSA is highly sensitive and specific in detecting deception, but, then again, nobody has proven to our satisfaction that it does not work for this purpose.

Despite the inability to validate VSA, some police love it. A detective in the Sex Crimes Section of the Metropolitan Nashville Police Department explained that:
We purchased 10 CVSA's and trained 20 examiners and in my opinion, the instruments and training have been one of the greatest assets our department has ever acquired. Not only has it helped us solve many crimes from major thefts to homicides, but has also helped expose false reports from victims, thus saving our department many man-hours of investigation. [5]
Considering that the fundamental question about VSA is whether it can distinguish between stress caused by intentional deception and stress caused other factors—think about the rape victim who has to describe the events to the police—this kind of screening is a little frightening. It helped the Sanford police and prosecutor, though. Or did it?


1. The comparison to a coin may be misleading. Consider Table 12 in the Oklahoma study. Kelly R. Damphousseat et al., Assessing the Validity of Voice Stress Analysis Tools in a Jail Setting, Mar. 31, 2007, at 53 (NCJRS doc. no. 219031, available at It indicates that 87 subjects tested positive for cocaine use within the past 72 hours. Of these, 40 deceptively stated that they had not used cocaine in this period. However, VSA programs only correctly indicated deception for eight of the 40. This is a sensitivity of 20%. Given an individual lying about recent cocaine use, the programs had only one chance in five of recognizing the deception. However, Their specificity was much better. Of the 47 respondents who were not deceptive, the programs correctly classified 42 (89%) as truthful. A coin flip, on the other hand, would have a specificity and sensitivity of 50%. See, e.g., David H. Kaye, The Validity of Tests: Caveant Omnes, 27 Jurimetrics J. 349 (1987).


  FWIW, the comparison of polygraphic lie detection to flipping coins predates Scheffer. See, e.g., David Lykken, A Tremor in the Blood 149 (1981).

  2. FWIW, the comparison of polygraphic lie detection to flipping coins predates Scheffer. See, e.g., David Lykken, A Tremor in the Blood 149 (1981).