Saturday, November 24, 2018

Breaking the Promise of Confrontation in Stuart v. Alabama

Denials of cert usually are not worth mentioning, but the one in Stuart v. Alabama is notable. In that case, Alabama courts relied on a surrogate-witness theory to admit into evidence two laboratory reports of high blood-alcohol concentration without any testimony from the technician who wrote the reports or from anyone else involved in their preparation -- even though in Bullcoming v. New Mexico the Supreme Court rejected this theory as a justification for not affording a defendant the opportunity to confront the author of the report.

In response to a petition for a writ of certiorari, the state contended that the Bullcoming violation did not matter because the numbers in the reports were only the basis for an expert's extrapolation to the concentration at the time of the accident that gave rise to the negligent homicide and drunken driving prosecution. Stuart described this representation of the record as "not candid," but the state insisted that the admission of the reports could be upheld by piecing together votes from Williams v. Illinois. It dismissed the fact that, unlike Williams, the reports were introduced into evidence (and the reported concentrations described as a far in excess of the legal limit) as a mere technicality.

The Supreme Court denied the petition. The state's tortured argument about Williams provoked Justice Gorsuch, together with Justice Sotomayor, to file a dissenting opinion maintaining that cross-examination is needed to expose bias and error in forensic science reports and expressing strong disagreement with the plurality and Justice Thomas's opinions in Williams v. Illinois. More details follow.
At around 11:00 p.m., April 1, 2015, police found Vanessa Stuart's vehicle off the steep shoulder of the road. Inside, Stuart was talking on the telephone. Another vehicle sat at the edge of the woods with Tiffany Howell's dead body inside. A traffic-homicide investigator determined that Stuart’s vehicle, traveling at 90 to 100 miles per hour, had struck Howell’s from behind, spinning it and causing it to roll several times before striking a tree.

At the hospital, Stuart refused a blood-alcohol test and tried to leave. Police arrested her and took her to jail. After acquiring a search warrant for her blood, they took her back to the hospital to secure vials of her blood. By that time, four hours had passed. A second sample was taken half an hour later. The vials went to the Alabama Department of Forensic Sciences, where Belicia Sutton wrote reports about the alcohol levels in the samples from "the suspect."

At Stuart's trial for negligent homicide and driving under the influence, the state did not call Sutton to the witness stand. Instead, it offered the reports themselves into evidence and then had Dr. James Hudson, the laboratory's toxicology section chief, extrapolate backwards from the already high level of 0.174 recorded in the first report to conclude that Stuart’s blood-alcohol level at the time of the wreck was a whopping 0.234.

Stuart appealed her resulting convictions, arguing in part that the state deprived her of her Sixth Amendment right to confront the witnesses against her. Dr. Hudson, she pointed out, was not involved in the testing and did not even work for the state at the time of the accident. In an unpublished opinion, the Alabama Court of Criminal Appeals rejected the argument on the theory that Hudson could stand in as a surrogate for Sutton. It wrote that:
Dr. Hudson gave extensive testimony regarding the policies and procedures of the DFS’s toxicology laboratory. This included controls in the analysis and the laboratory’s standard practice of having the results of the analysis independently reviewed. Dr. Hudson testified that "as the [toxicology] section chief, I’m fundamentally the toxicology supervisor so I’m responsible for the day-to-day workflow in the laboratory, testing assignments for cases, as well as personnel management.” (R. 630.) “This testimony provided [Stuart] with ample opportunity to cross-examine [Dr. Hudson] regarding the [blood]-analysis report.” Taylor v. State [Ms. CR-15-0354, Sept. 9, 2016] __ So. 3d __, __ (Ala. Crim. App. 2016). This Court holds that Stuart’s right to confront the witnesses against her was not violated by the circuit court’s allowing Dr. Hudson to testify to the results of her blood analysis. As such, this issue does not entitle Stuart to any relief.
The Alabama Supreme Court declined to review the case, and Stuart petitioned the U.S. Supreme Court for a writ of certiorari on the ground that Sutton's reports were received as evidence of Stuart's blood-alcohol level through Hudson's testimony in stark violation of the Confrontation Clause as applied in Bullcoming v. New Mexico, 564 U.S. 647 (2011).

She had a point. Bullcoming was another DUI case in which a suspect's blood was taken at a hospital and sent to the state forensic laboratory for analysis. As in Stuart, "the State called another analyst who was familiar with the laboratory's testing procedures, but had neither participated in nor observed the test on [the] blood sample." In an opinion joined in relevant part by four members of the Court, Justice Ginsburg rejected the surrogate theory in sweeping terms:
The question presented is whether the Confrontation Clause permits the prosecution to introduce a forensic laboratory report containing a testimonial certification — made for the purpose of proving a particular fact — through the in-court testimony of a scientist who did not sign the certification or perform or observe the test reported in the certification. We hold that surrogate testimony of that order does not meet the constitutional requirement. The accused's right is to be confronted with the analyst who made the certification, unless that analyst is unavailable at trial, and the accused had an opportunity, pretrial, to cross-examine that particular scientist.
Id. at 652. Justice Sotomayor concurred, highlighting the circumstances that made the surrogate witness's testimony an unacceptable substitute: "the person testifying [was not] a supervisor, reviewer, or someone else with a personal, albeit limited, connection to the scientific test at issue" and "an expert witness was [not] asked for his independent opinion about underlying testimonial reports that were not themselves admitted into evidence."

In response to the Bullcoming argument, the state abandoned the surrogacy theory of its trial and appellate courts. It argued that Hudson's testimony about the reports was not subject to the confrontation requirement because the blood-alcohol level of 0.174 (and a slightly lower reading from the later sample) were not introduced to prove that Stuart was driving with a blood-alcohol concentration above the legal limit, but rather was a hypothetical assumption made solely to arrive at the extrapolated figure of 0.234. So characterized, Hudson's testimony did not offend the Confrontation Clause because "[t]he Clause ... does not bar the use of testimonial statements for purposes other than establishing the truth of the matter asserted," Crawford v. Washington, 541 U.S. 36, 53 n.4 (2004), and the state was not trying to prove that Stuart's blood alcohol concentration was 0.174 hours after the accident. At least, that is what the state claimed. .

That a majority of the Justices of the Supreme Court rejected just such an argument (in separate opinions that conflicted in another respect) in Williams v. Illinois, 567 U.S. 50 (2012), did not faze Alabama's Attorney General. His brief contended that because four Justices propounded the hypothetical-assumption argument in Williams, and because one of the Justices who rejected it also maintained that most laboratory reports lack the formality necessary to be statements that trigger a right to confront their authors, Hudson's testimony was constitutionally admitted into evidence.

Aside from its inherent artificiality, this reasoning overlooks the fact that, as in Bullcoming (but not Williams), the laboratory reports were explicitly admitted into evidence. Their admission and publication to the jury without an opportunity to cross-examine their author violated the Confrontation Clause even if Hudson's reiteration of their content was permissible under the plurality's opinion in Williams. Apparently, the jury was not instructed that they were not to rely the numbers in the reports as true, but only to use Dr. Hudson's opinion -- that is, his extrapolation from them -- as evidence against the accused. Indeed, the state had Dr. Hudson testify that the laboratory's findings of 0.174 and 0.158 greatly exceeded the legal limit of 0.08 (prompting Stuart to describe Alabama's argument as "not candid"). In contrast, the Williams plurality noted that the trier of fact there was a learned judge who could be expected (somehow) not to rely on the laboratory report for its truth but to consider it only as an explanation of how the testifying expert reached her "independent" conclusion). Alabama dismissed these differences as mere technicalities.

The Supreme Court denied the petition in Stuart. Of course, in itself a denial of such a petition has no precedential effect and is not even an expression of views on the merits of the case. The Court grants cert for but a small fraction of the many petitions it receives, rarely giving a reason for denying the petitions.

Nevertheless, the inaction in Stuart may seem disappointing. With its four inconclusive and conflicting opinions, Williams has licensed chaos in the lower courts. But Stuart may not have been a suitable vehicle for re-examining the not-for-the-truth reasoning of the Williams plurality. Had it granted certiorari, the Court might have written a two-sentence opinion remanding the case for a determination of whether the violation of Bullcoming was harmless error. (Well, maybe more than two, just to point out that the not-for-the truth reasoning, already rejected by a majority of the Court in Williams, cannot possibly be extended to cases in which laboratory reports are admitted into evidence without limitation.) Or, the Court could have used Stuart to overrule the 5-4 decision in Bullcoming in order to affirm. But the case was not ideally suited to cleaning up the mess left by Williams.

Even so, two Justices dissented from the denial of certiorari and issued a substantial opinion on the merits -- an unusual action. Justice Gorsuch, who was not on the Court for its trilogy of opinions on the Confrontation Clause and laboratory reports (Melendez-Diaz, Bullcoming, and Williams), wrote this dissenting opinion. Justice Sotomayor joined it. The opinion begins with a paean to cross-examination:
More and more, forensic evidence plays a decisive role in criminal trials today. But it is hardly “immune from the risk of manipulation.” Melendez-Diaz v. Massachusetts, 557 U.S. 305, 318 (2009). A forensic analyst “may feel pressure—or have an incentive—to alter the evidence in a manner favorable to the prosecution.” Ibid. Even the most well-meaning analyst may lack essential training, contaminate a sample, or err during the testing process. ... To guard against such mischief and mistake and the risk of false convictions they invite, our criminal justice system depends on adversarial testing and cross-examination. Because cross-examination may be “the greatest legal engine ever invented for the discovery of truth,” ... the Constitution promises every person accused of a crime the right to confront his accusers. ... [¶] That promise was broken here.
Whether cross-examination is generally effective at exposing inadequate training, contamination, or error is open to question, but it certainly can complement the scientific engine for discovering truths about alcohol levels, trace evidence, and the like.

With this introduction in place, Justice Gorsuch observed that "the State of Alabama introduced in evidence the results of a blood-alcohol test conducted hours after [Stuart's] arrest [but] refused to bring to the stand the analyst who performed the test." But the opinion does not note that the state was seeking to extend the plurality's rule in Williams to a laboratory report actually admitted into evidence and presented to the jury as proof of what it asserts. Rather, Justice Gorsuch simply endorsed the position taken in Williams by the five dissenting Justices and Justice Thomas. They maintained that the not-for-the-truth theory is untenable because the testifying expert's opinion cannot be credited unless the missing witness's report is true. As Justice Gorsuch put it,
The whole point of the exercise was to establish—because of the report’s truth—a basis for the jury to credit the testifying expert’s estimation of Ms. Stuart’s blood-alcohol level hours earlier. As the four dissenting Justices in Williams explained, “when a witness . . . repeats an out-of-court statement as the basis for a conclusion, . . . the statement’s utility is then dependent on its truth.” 567 U. S., at 126 (opinion of KAGAN, J.). With this JUSTICE THOMAS fully agreed, observing that “[t]here is no meaningful distinction between disclosing an out-of-court statement so that the factfinder may evaluate the [testifying] expert’s opinion and disclosing that statement for its truth.”  Id., at 106 (opinion concurring in judgment).
Although this is the better understanding of the situation even when, as in Williams, the report is not introduced into evidence, in Stuart, the Williams plurality could adhere to their more contrived analysis while agreeing with Justice Gorsuch that no "prosecutor [would] bother to offer in evidence the nontestifying analyst’s report in this case except to prove the truth of its assertions about the level of alcohol in Ms. Stuart’s blood at the time of the test" (emphasis added).

The opinion concludes with a short analysis of Alabama's additional claim that the laboratory report was not "testimonial" because it lacked the formality of depositions, affidavits, certificates, or similar instruments. Here Justice Gorsuch joins the ranks of nearly every other Justice. Only Justice Thomas contends that police laboratory reports prepared for criminal investigations and possible prosecutions are not sufficiently formal to be testimonial unless they are sworn certificates.

The Stuart dissent is a clear and well warranted plea for a clarification of the Williams decision. Significantly, it places Justice Gorsuch on the side of those who oppose insulating the authors of a laboratory report from cross-examination simply by presenting those reports as the basis for some other expert's opinion. Laboratory reports raise special issues for Confrontation Clause jurisprudence, but they should be faced more directly. See Jennifer L. Mnookin & David H. Kaye, Confronting Science: Expert Evidence and the Confrontation Clause, 2012 Sup. Ct. Rev. 99 (2013).

Friday, November 23, 2018

Cheapskate's DNA Could Be His Undoing

Wisconsin was the first state to issue criminal complaints "naming" the suspect through a DNA profile so as to avoid the statute of limitations. The state court of appeals upheld the practice in State v. Dabney, 663 N.W.2d 366 (Wisc. Ct. App. 2003). Today, there are some 23 such DNA complaints pending in Wisconsin. Most are for burglaries. Some are for unsolved sexual assaults. One is for an armed robbery.

The most recent complaint addressed to an an unknown defendant, however, is for threatening a county judge in 2012. It is captioned
State of Wisconsin, Plaintiff
v.
Doe, John, Unknown Male, with Matching Deoxyribonucleic Acid (DNA) Profile at Genetic Locations D3S1358 (15, 18), TH01(6, 9.3), D21S11 (29, 31.2), D18S51 (13, 15), Penta E (12), D5S818 (11, 13), D13S17 (11, 14), D7S820 (10, 11), D16S539 (13, 14), CSF1PO (11, 12), Penta D (9, 12), Amelogenin (X, Y), vWA (17), D8S1179 (12, 13), TPOX (9, 11), and FGA (22, 22.2), Defendant
The list is not just the genetic locations. (That would be useless, since everyone has these genetic locations.) The identification of the individual comes from the DNA features -- the "alleles" -- at these "loci." The identifying alleles are designated by the numbers in parentheses.

The DNA that produced this profile came from a nine-cent stamp affixed to the envelop containing the threatening letter. Presumably, the individual making the threat licked the stamp. Indeed, the same profile was found for DNA recovered from threatening letters mailed to three other public officials in Wisconsin. Whether the sender was able to get away with using nine-cent stamps in these other incidents has not been reported. If he is ever caught, postal fraud will be the least of his problems.

SOURCES
  1. Ed Treleven, With Clock Ticking, DOJ Charges Unidentified Suspect for Threatening Judge in 2012, Wisc. State J., Oct. 9, 2018.
  2. Meagan Flynn, The Culprit’s Name Remains Unknown. But He Licked a Stamp, and Now His DNA Stands Indicted, Wash. Post, Oct. 17, 2018

Muddling Through the Measurement of IQ

IQ scores are a critical component in the diagnosis of intellectual disability. That measurements of IQ are subject to various sources of measurement error is widely appreciated, but by and large, lawyers and psychologists have supplied rather imprecise -- and sometimes incorrect -- explanations of the statistics involved. A recent example is Intellectual Disability and the Death Penalty: Current Issues and Controversies, a book intended as "a valuable resource for mental health experts, attorneys, investigators, mitigation specialists, and other members of legal teams, as well as judges." 1/ The authors explain the "standard scores" that put the mean IQ score in the population at 100 as follows:
A person's standard score on a test is calculated by transforming the individual's obtained raw score on Test A (e.g., the sum of the number of correct responses on a test) using the population's known mean and standard deviation on Test A, which transforms the individual's test performance onto a common metric allowing us to compare his or her score to anyone else tested with Test A. Standard scores are possible only for tests where, if administered to the entire population, the distribution of all test scores on said test would be normally distributed ... . A percentile score is one form of a standard score that permits the interpretation of a person's performance in relation to a reference group. Although not a requirement, in the case of many psychological tests the scale for standard scores is set to have a mean or average score of 100 and a standard deviation of 15. Thus, a test performance that results in a standard score of 70 is said to be "significantly" below average or approximately two standard deviations below the population mean. A standard deviation is a unit of measure that indicates the distance from the average. During the standardization phase of the development of a standardized test, the test and its items are administered to a large anc representative sample of the reference group of interest or population. This is generally referred to as the standardization sample or norming group. From this norming group, the test developers compute the population's mean score and standard deviation on the test. The mean score and standard deviation are essential to transforming subsequently obtained raw scores (i.e., the sum of the number of correct items) on said test to a standard scale score (e.g., intelligence quotient, or IQ). 2/
Percentiles and Standard Scores

Standard scores have some value in "compar[ing one individual's] score to anyone else tested with Test A." Unlike raw scores, they incorporate the variance in the scores across different test-takers into the reported score. They are perhaps more useful for comparing scores from different tests (or different forms of the same test, or from tests administered to populations that are changing over time)

But whatever the motivation for a standardized reporting scale, it is strange to describe percentiles as standard scores. A standard score is just a particular linear transformation of a raw score that specifies "the number of standard deviations above (+) or below (-) the mean you are." 3/ As an example, suppose that the raw-score population mean for "Test A" is 60; that the population standard deviation is 12; and that a test taker has a raw score of 50. The standard score is 5/6s of a standard deviation below the mean: z = (50 - 60)/12 = -5/6 = -0.83.

To translate the raw score (or the corresponding z-score of -0.83) into a percentile, we need to know how the raw scores are distributed. For example, if raw scores were uniformly distributed from about 39 to 81, then some 26% of them would be 50 or less. If the raw scores were normally distributed (with the same mean and standard deviation), then 20% of the population would have a raw score of 50 (or less). Other distributions would produce other percentiles. Consequently, the percentile is not "one form of a standard score." At best, the percentile can be deduced from the standard score and other information.

A Standardized Scale Does Not Require Normality

Why are "[s]tandard scores ... possible only for tests where, if administered to the entire population, the distribution of all test scores on said test would be normally distributed"? Standard scores can be constructed for any distribution of test scores with a defined mean and standard deviation. Normality may be convenient or common, but it is not essential to a standardized score scale.

So What?

Not much turns on these corrections to the explanation in Intellectual Disability and the Death Penalty. IQ scores are more or less normally distributed, and the use of IQ scores of 70 and below (z ≤ -2) as the range in which an individual can be diagnosed as intellectually disabled limits the diagnosis to no more than roughly 2.3% of the general population.

But why should "a standard score of 70 [be] said to be 'significantly' below average"? Why is not an IQ score of 71 -- or even 80 -- significantly below the mean of 100? There is no statistical reason to focus on 70 as a cut off. In Hall v. Florida, 572 U.S. 5 (2014), a  majority of the Supreme Court was content with categorically excluding from the zone of intellectual disability (for the purpose of deciding potential eligibility for capital punishment) all defendants with true IQ scores above 70. Yet, no one could explain the basis for this fundamental choice. It is a convention currently in vogue among experts who want to have some such threshold. 4/

Quantifying Measurement Error

At the same time that the Court limited eligibility for the constitutional exemption from capital punishment because of intellectual disability to a small fraction of the population by approving of the z ≤ -2 range for true scores, it held that a slightly higher cutoff for observed scores was constitutionally necessary to ensure that random error in measuring IQ does not preclude too many defendants with true scores of 70 or less from consideration. Intellectual Disability and the Death Penalty explained this refinement as follows:
The Supreme Court of the United States in Hall v. Florida ruled that states must consider the test's standard error of measurement when interpreting obtained IQ scores in cases where the defendant is making an intellectual decision claim. ...
The standard error of measurement (SEM) is a direct measure of the test's reliability and is computed by administering the test to a large and representative sample of the population to be assessed on the test and computing the test's reliability coefficient, which can then be translated into an average error of measurement for the population ... . Generally, the SEM is computed and then used to create confidence intervals around the obtained standard scores (e.g., 95% certainty). A confidence interval of 95% represents a statistical certainty that, based on the knowledge of this test's reliahility coefficient, there is a 95% chance that the person's true score falls within a confidence interval that is +/-2 times the test's SEM. Thus, a professional reporting on an assessed individual's "obtained" full-scale IQ score of 70 on IQ Test A and knowing that Test A has a SEM of 2.5 around its full-scale IQ score, he would report that there is a 95% certainty that the assessed person's "true" full-scale IQ score falls within the range of 65-75 (i.e., 2x2.5= +/-5 points). 5/
This passage is garbled in two ways. To begin with, SEM is not "a direct measure of the test's reliability." It is a statistic derived from "the test's reliability coefficient." There are many ways to estimate reliability, and the logic behind the move from reliability to SEM is subtle. A better statistic for estimating the uncertainty in the observed score would be the standard error of estimate (SEE). The SEM is an average across all scores. The SEE takes into account the fact that uncertainty increases as one moves away from the population mean (IQ = 100). A description of the SEE can be found elsewhere. 6/

Second, the 95% in a 95% confidence interval is neither a "statistical certainty" nor "a 95% chance that the person's true score falls within [the computed] confidence interval." This interpretation of "confidence" is ubiquitous -- and widely known (to statisticians) to be wrong. The misinterpretation was apparent in the dissenting opinion written for four Justices by Justice Alito. It probably was implicit in the majority opinion penned by Justice Kennedy. Although we would expect (in the long run) 95% of all 95% confidence intervals  to contain the true value, the probability that a particular interval covers the true score cannot be computed with the machinery of confidence intervals. 7/ Interval estimates that can be said to provide such probabilities require Bayes theorem. Again, discussion and examples for IQ scores are available elsewhere. 8/

Clinical psychologists, lawyers, and judges are not statisticians. They do not have to compute means, standard deviations, standard errors, confidence intervals, or Bayesian credible regions. Nevertheless, to become more astute users of such statistics, they need a better understanding of the reasoning behind standard scores and expressions for measurement error.

NOTES
  1. Marc L. Tassé & John H. Blume, Intellectual Disability and the Death Penalty: Current Issues and Controversies vii (Prager 2018).
  2. Id. at 87.
  3. Penn State University Eberly College of Science, STAT 100: Statistical Concepts and Reasoning § 5.2 (2018), https://onlinecourses.science.psu.edu/stat100/node/13/
  4. David H. Kaye, Deadly Statistics: Quantifying an "Unacceptable Risk" in Capital Punishment, 16 Law, Probability & Risk 7-34 (2017), http://ssrn.com/abstract=2788377.
  5. Tassé & Blume, supra note 1, at 90.
  6. Kaye, supra note 4.
  7. For an elaboration in legal settings, see David H. Kaye, Apples and Oranges: Confidence Coefficients and the Burden of Persuasion, 73 Cornell L. Rev. 54 (1987).
  8. Kaye, supra note 4.
POSTINGS ON IQ SCORES AND CAPITAL PUNISHMENT