Forensic Science, Statistics & the Law: death penalty

Showing posts with label death penalty. Show all posts

Friday, November 23, 2018

Muddling Through the Measurement of IQ

IQ scores are a critical component in the diagnosis of intellectual disability. That measurements of IQ are subject to various sources of measurement error is widely appreciated, but by and large, lawyers and psychologists have supplied rather imprecise -- and sometimes incorrect -- explanations of the statistics involved. A recent example is Intellectual Disability and the Death Penalty: Current Issues and Controversies, a book intended as "a valuable resource for mental health experts, attorneys, investigators, mitigation specialists, and other members of legal teams, as well as judges." 1/ The authors explain the "standard scores" that put the mean IQ score in the population at 100 as follows:

A person's standard score on a test is calculated by transforming the individual's obtained raw score on Test A (e.g., the sum of the number of correct responses on a test) using the population's known mean and standard deviation on Test A, which transforms the individual's test performance onto a common metric allowing us to compare his or her score to anyone else tested with Test A. Standard scores are possible only for tests where, if administered to the entire population, the distribution of all test scores on said test would be normally distributed ... . A percentile score is one form of a standard score that permits the interpretation of a person's performance in relation to a reference group. Although not a requirement, in the case of many psychological tests the scale for standard scores is set to have a mean or average score of 100 and a standard deviation of 15. Thus, a test performance that results in a standard score of 70 is said to be "significantly" below average or approximately two standard deviations below the population mean. A standard deviation is a unit of measure that indicates the distance from the average. During the standardization phase of the development of a standardized test, the test and its items are administered to a large anc representative sample of the reference group of interest or population. This is generally referred to as the standardization sample or norming group. From this norming group, the test developers compute the population's mean score and standard deviation on the test. The mean score and standard deviation are essential to transforming subsequently obtained raw scores (i.e., the sum of the number of correct items) on said test to a standard scale score (e.g., intelligence quotient, or IQ). 2/

Percentiles and Standard Scores

Standard scores have some value in "compar[ing one individual's] score to anyone else tested with Test A." Unlike raw scores, they incorporate the variance in the scores across different test-takers into the reported score. They are perhaps more useful for comparing scores from different tests (or different forms of the same test, or from tests administered to populations that are changing over time)

But whatever the motivation for a standardized reporting scale, it is strange to describe percentiles as standard scores. A standard score is just a particular linear transformation of a raw score that specifies "the number of standard deviations above (+) or below (-) the mean you are." 3/ As an example, suppose that the raw-score population mean for "Test A" is 60; that the population standard deviation is 12; and that a test taker has a raw score of 50. The standard score is 5/6s of a standard deviation below the mean: z = (50 - 60)/12 = -5/6 = -0.83.

To translate the raw score (or the corresponding z-score of -0.83) into a percentile, we need to know how the raw scores are distributed. For example, if raw scores were uniformly distributed from about 39 to 81, then some 26% of them would be 50 or less. If the raw scores were normally distributed (with the same mean and standard deviation), then 20% of the population would have a raw score of 50 (or less). Other distributions would produce other percentiles. Consequently, the percentile is not "one form of a standard score." At best, the percentile can be deduced from the standard score and other information.

A Standardized Scale Does Not Require Normality

Why are "[s]tandard scores ... possible only for tests where, if administered to the entire population, the distribution of all test scores on said test would be normally distributed"? Standard scores can be constructed for any distribution of test scores with a defined mean and standard deviation. Normality may be convenient or common, but it is not essential to a standardized score scale.

So What?

Not much turns on these corrections to the explanation in Intellectual Disability and the Death Penalty. IQ scores are more or less normally distributed, and the use of IQ scores of 70 and below (z ≤ -2) as the range in which an individual can be diagnosed as intellectually disabled limits the diagnosis to no more than roughly 2.3% of the general population.

But why should "a standard score of 70 [be] said to be 'significantly' below average"? Why is not an IQ score of 71 -- or even 80 -- significantly below the mean of 100? There is no statistical reason to focus on 70 as a cut off. In Hall v. Florida, 572 U.S. 5 (2014), a majority of the Supreme Court was content with categorically excluding from the zone of intellectual disability (for the purpose of deciding potential eligibility for capital punishment) all defendants with true IQ scores above 70. Yet, no one could explain the basis for this fundamental choice. It is a convention currently in vogue among experts who want to have some such threshold. 4/

Quantifying Measurement Error

At the same time that the Court limited eligibility for the constitutional exemption from capital punishment because of intellectual disability to a small fraction of the population by approving of the z ≤ -2 range for true scores, it held that a slightly higher cutoff for observed scores was constitutionally necessary to ensure that random error in measuring IQ does not preclude too many defendants with true scores of 70 or less from consideration. Intellectual Disability and the Death Penalty explained this refinement as follows:

The Supreme Court of the United States in Hall v. Florida ruled that states must consider the test's standard error of measurement when interpreting obtained IQ scores in cases where the defendant is making an intellectual decision claim. ...
The standard error of measurement (SEM) is a direct measure of the test's reliability and is computed by administering the test to a large and representative sample of the population to be assessed on the test and computing the test's reliability coefficient, which can then be translated into an average error of measurement for the population ... . Generally, the SEM is computed and then used to create confidence intervals around the obtained standard scores (e.g., 95% certainty). A confidence interval of 95% represents a statistical certainty that, based on the knowledge of this test's reliahility coefficient, there is a 95% chance that the person's true score falls within a confidence interval that is +/-2 times the test's SEM. Thus, a professional reporting on an assessed individual's "obtained" full-scale IQ score of 70 on IQ Test A and knowing that Test A has a SEM of 2.5 around its full-scale IQ score, he would report that there is a 95% certainty that the assessed person's "true" full-scale IQ score falls within the range of 65-75 (i.e., 2x2.5= +/-5 points). 5/

This passage is garbled in two ways. To begin with, SEM is not "a direct measure of the test's reliability." It is a statistic derived from "the test's reliability coefficient." There are many ways to estimate reliability, and the logic behind the move from reliability to SEM is subtle. A better statistic for estimating the uncertainty in the observed score would be the standard error of estimate (SEE). The SEM is an average across all scores. The SEE takes into account the fact that uncertainty increases as one moves away from the population mean (IQ = 100). A description of the SEE can be found elsewhere. 6/

Second, the 95% in a 95% confidence interval is neither a "statistical certainty" nor "a 95% chance that the person's true score falls within [the computed] confidence interval." This interpretation of "confidence" is ubiquitous -- and widely known (to statisticians) to be wrong. The misinterpretation was apparent in the dissenting opinion written for four Justices by Justice Alito. It probably was implicit in the majority opinion penned by Justice Kennedy. Although we would expect (in the long run) 95% of all 95% confidence intervals to contain the true value, the probability that a particular interval covers the true score cannot be computed with the machinery of confidence intervals. 7/ Interval estimates that can be said to provide such probabilities require Bayes theorem. Again, discussion and examples for IQ scores are available elsewhere. 8/

Clinical psychologists, lawyers, and judges are not statisticians. They do not have to compute means, standard deviations, standard errors, confidence intervals, or Bayesian credible regions. Nevertheless, to become more astute users of such statistics, they need a better understanding of the reasoning behind standard scores and expressions for measurement error.

NOTES

Marc L. Tassé & John H. Blume, Intellectual Disability and the Death Penalty: Current Issues and Controversies vii (Prager 2018).
Id. at 87.
Penn State University Eberly College of Science, STAT 100: Statistical Concepts and Reasoning § 5.2 (2018), https://onlinecourses.science.psu.edu/stat100/node/13/
David H. Kaye, Deadly Statistics: Quantifying an "Unacceptable Risk" in Capital Punishment, 16 Law, Probability & Risk 7-34 (2017), http://ssrn.com/abstract=2788377.
Tassé & Blume, supra note 1, at 90.
Kaye, supra note 4.
For an elaboration in legal settings, see David H. Kaye, Apples and Oranges: Confidence Coefficients and the Burden of Persuasion, 73 Cornell L. Rev. 54 (1987).
Kaye, supra note 4.

POSTINGS ON IQ SCORES AND CAPITAL PUNISHMENT

Quarreling and Quibbling over Psychometrics in Hall v. Florida (part 1), May 29, 2014 (introduction)
Quarreling and Quibbling over Psychometrics in Hall v. Florida (part 2), June 2, 2014 (on standard deviation)
Quarreling and Quibbling over Psychometrics in Hall v. Florida (part 3), June 4, 2014 (on validity and the stability of the APA's diagnostic criteria)
After Moore v. Texas Is a Single IQ Score Really Determinative?, March 29, 2017
Muddling Through the Measurement of IQ, November 23, 2018

Wednesday, March 29, 2017

After Moore v. Texas Is a Single IQ Score Really Determinative?

Bobby J. Moore has been on death row for the last 37 years. On Monday, the Supreme Court ruled that the Texas Court of Criminal Appeals (the state’s highest court for criminal cases) erred in finding that Moore is not intellectually disabled. Justice Ginsburg wrote for the five-member majority. The Chief Justice wrote a strong dissent for the other three justices. Neither opinion (on my quick reading at least) comes to grips with an obvious statistical principle—that combining information reduces uncertainty.

Moore v. Texas is the third case to try to clarify the rule in Atkins v. Virginia, 536 U.S. 304 (2002). There, the Supreme Court held that the Eighth Amendment’s Cruel and Unusual Punishment Clause prevents a state from executing an intellectually disabled offender, but it left the states with latitude in defining the disability. In Moore, the Court held that the Texas tribunal applied a medically outdated—and (hence?) constitutionally impermissible—standard in rejecting Moore’s claim of disability. Most of the majority opinion concerns “adaptive functioning,” which must be substantially impaired for a diagnosis of intellectual disability to be made.

However, the Court in Hall v. Florida, 572 U.S. 5 (2014), allowed a state to refuse to inquire into adaptive functioning if an offender’s true IQ score is at least 70. Hall explicitly stated that the following statutory definition of intellectual disability was constitutionally acceptable:

“significantly subaverage general intellectual functioning existing concurrently with deficits in adaptive behavior and manifested during the period from conception to age 18,” where “significantly subaverage general intellectual functioning” is “performance that is two or more standard deviations from the mean score on a standardized intelligence test.”

Because IQ scores for the whole population are roughly normally distributed with a mean of approximately 100 and a standard deviation of about 15, Hall allows the state to execute offenders whose "true scores" are above 70.

In deciding whether a true score is above 70, Hall demanded that the state attend to the error of measurement. As the Moore Court, quoting from Hall, explained, "'[f]or purposes of most IQ tests,' [the] imprecision in the testing instrument 'means that an individual’s score is best understood as a range of scores on either side of the recorded score . . . within which one may say an individual’s true IQ score lies.'" For a single test with a standard error of 2.5 IQ points, it follows (for normally distributed errors) that the measured score must be greater than or equal to 75 (= 70 + two standard errors) to avoid "an unacceptable risk that persons with intellectual disability will be executed." 1/

But what about multiple scores? In that common situation, the Hall Court seemed conflicted. Justice Kennedy opaquely opined that “[e]ven when a person has taken multiple tests, each separate score must be assessed using the SEM, and the analysis of multiple IQ scores jointly is a complicated endeavor.” Does this mean that no matter how many IQ tests have been administered and no matter how many of them lie above 70, a single score of 75 or less makes a conclusive case for “significantly subaverage general intellectual functioning”?

From a statistical perspective, a lowest-single-score seems very strange indeed. If I want to know whether I have a fever and I take ten measurements of my temperature (with ten thermometers), I would not say that I have a fever just because one thermometer gives a high reading. I would use an average, and the mean temperature would have greater precision (smaller standard error) than the single highest reading of the ten.

Justice Ginsburg’s opinion in Moore seems to fly in the face of this common-sense statistical point. The Texas court focused on two test scores — "a 78 in 1973 and 74 in 1989." It pointed to factors that might have biased the latter score toward the low end, leaving the higher one as entitled to more weight. Specifically, it wrote that there was expert testimony that Moore might not have been putting much effort into answering the questions in the lower-scoring test, which was given to him in prison, and that he "also took the WAIS–R under adverse circumstances; he was on death row and facing the prospect of execution, and he had exhibited withdrawn and depressive behavior." Ex Parte Moore, 470 S.W.3d 481, 519 (Tex. Ct. Crim. App. 2015). Thus, the court concluded,

These considerations might tend to place his actual IQ in a somewhat higher portion of that 69 to 79 range. ... Considering these factors together, we find no reason to doubt that applicant's [higher] WAIS–R score accurately and fairly represented his intellectual functioning as being above the intellectually disabled range.

The Supreme Court assumed that it was necessary to consider each test in isolation and without making a clinical adjustment to the statistically determined plus-or-minus-five-point margin of error. Justice Ginsburg called the statistical range of error "clinically established." She described and condemned the Texas court's evaluation of the clinical testimony as follows:

Based on the two scores, but not on the lower portion of their ranges, the court concluded that Moore’s scores ranked “above the intellectually disabled range” (i.e., above 70). ... But the presence of other sources of imprecision in administering the test to a particular individual, cannot narrow the test-specific standard-error range. [W]e require that courts continue the inquiry and consider other evidence of intellectual disability where an individual’s IQ score, adjusted for the test’s standard error, falls within the clinically established range for intellectual-functioning deficits.

Thus, she insisted that just because "Moore’s score of 74, adjusted for the standard error of measurement, yields a range of 69 to 79" so that "the lower end ... falls at or below 70, the [Court of Criminal Appeals] had to move on to consider Moore’s adaptive functioning." (Emphasis added.)

In sum, Moore seems to say that a clinician cannot tinker with the statistical margin of error (two standard errors as constitutionalized in Hall). The dissent vigorously disagreed with this rule and maintained that the constitution permits states to make adjustments for individual circumstances that experts agree affect performance. A statistical argument for the dissent's position would be this: Computationally, the standard error reflects the variation in performance of a population of test-takers. This population-based figure is then applied to all individuals regardless of how strongly the sources of error apply to them. IQ tests administered in prison to inmates exhibiting signs of depression may not be part of that population. Those scores might have a larger or a smaller standard error, and they are generally lower than the true score for a person taking the test in normal circumstances. In other words, the clinician is not modifying the margin of error as much as adjusting the entire estimate upward.

This analysis does not necessarily render Moore's rule legally faulty. It might be undesirable to give clinicians this latitude to adjust scores. Under the majority's approach to the Eighth Amendment, the issue becomes whether the clinical guidelines for diagnosing disability allow individualized modifications of the statistical rule. The guidelines discussed in Hall are not completely clear. The dissent reads them as requiring an expert or a court to take the usual standard error seriously in interpreting an IQ score, but permitting reasoned and reasonable departures from them.

Even if Moore forbids individual adjustments to the statistical rule of plus-or-minus two standards errors (for the general population), why allow the confidence interval for a single test score to be dispositive when multiple tests scores are present? The clinical guidelines do not mandate this rule, and it is not so obvious that Moore does. Texas apparently made no effort to combine the two scores into a single point estimate with a margin of error applicable to the combined statistic. Hall claimed that combining scores from different IQ test forms was "complicated," although the literature it cited gave a simple procedure for doing so. So neither Hall nor Moore can be said to firmly establish that an appropriately averaged score is impermissible. After all, neither case presented the Court with an interval estimate for the true IQ score derived from multiple scores by an accepted statistical procedure, and the many-thermometer example given above illustrates the statistical deficiency in a rule that looks to every measured IQ score in isolation.

The single-score-too-low rule bends over backward to avoid misclassifying a disabled offender as normal. The rule might be defended on exactly that ground. But that is not the logic of Moore, which only asks what clinical guidelines for interpreting IQ scores allow. Moreover, if the real objective is make determinations of intellectual disability as fully informed as possible, it would seem more direct just to demand the inquiry into adaptive functioning along with IQ scores in all cases. On the other hand, if true IQ scores matter as a threshold to a richer inquiry into both intellectual and adaptive functioning, then statistically sound procedures for integrating all the IQ test results ought to be followed.

Further reading: David H. Kaye, Deadly Statistics: Quantifying an "Unacceptable Risk" in Capital Punishment, 16 Law, Probability & Risk 7 (2017).

NOTE

If the standard error were substantially less than 2.5, then the measured score would not have to be all five points above 70. The use of two standard errors also is on the high side; 1.96 standard errors provides 95% coverage. Justice Kennedy's opinion in Hall was not as clear as it should have been on these points, but this is the only interpretation consistent with the concept of confidence intervals and standard errors used in the opinion. In Bromfield v. Cain, 135 S.Ct. 2269 (2015), however, the Court wrote that after "[a]ccounting for this margin of error, Brumfield's reported IQ test result of 75 was squarely in the range of potential intellectual disability." Id. at 2278. The Court did not disclose the standard error of measurement for the test.

POSTINGS ON IQ SCORES AND CAPITAL PUNISHMENT

Quarreling and Quibbling over Psychometrics in Hall v. Florida (part 1), May 29, 2014 (introduction)
Quarreling and Quibbling over Psychometrics in Hall v. Florida (part 2), June 2, 2014 (on standard deviation)
Quarreling and Quibbling over Psychometrics in Hall v. Florida (part 3), June 4, 2014 (on validity and the stability of the APA's diagnostic criteria)
After Moore v. Texas Is a Single IQ Score Really Determinative?, March 29, 2017
Muddling Through the Measurement of IQ, November 23, 2018

Sunday, July 19, 2015

What the FBI Hair Examiner Said About Race in State v. Manning

Two previous postings discussed the meaning and import of FBI testimony about hair in a trial of Willie Manning. I questioned Justice Breyer's suggestion that the exoneration in one of these cases came about in part because "the evidence against him, including flawed testimony from an FBI hair examiner, was severely undermined." 1/ The evidence in the case that led to the overturned conviction did not include any hair evidence at all.

The verdict in the other case -- the one with hair evidence -- is still is question. Cellmark Forensics is trying to find and analyze DNA from a rape kit and fingernail scrapings. At a later point, the laboratory may turn to "other items of evidence," including, presumably, the hair fragments from the car of one of the victims. 2/

Testimony about this hair figured prominently in the trial. In closing, the prosecutor asked "how many [people] could leave hair fragments in the car, hair fragments that came from a member of the African-American race because that's what they find when they vacuum the sweepings of the car, that's what they find in both significantly the passenger's seat and the driver's seat, just like it would be if the man rode out there as a passenger and came back as a driver . . . ." 3/

The prosecutor was able to make this argument because of testimony from an FBI examiner about the perceived racial characteristics of the hair. In my posting of July 16, I wrote that the Department of Justice's letter of May 4 confessing scientific error

is hardly a repudiation of the testimony that the hairs from the car "exhibited characteristics associated with the black race." To the contrary, it endorses this testimony as a permissible "scientific analysis." ... What "would be error," in the DOJ's view, is "any statement of probability whether the hair is from a particular racial group." But it is impossible to tell from the letter whether the FBI agent gave any such testimony. The federal district court's opinion denying Manning's habeas corpus petition makes it sound like the testimony was not of the sort later deprecated by the FBI. 4/

Today, I obtained the motion that persuaded the Mississippi Supreme Court to stay the execution and to order DNA testing. 5/ The FBI agent's testimony is appended to it. The examiner not only presented the characteristics of the hairs as associated with race ("valid" according to the DOJ), but added that "these hairs were hairs from an individual of the black race" 6/ -- not "valid."

NOTES

David H. Kaye, Justice Breyer in Glossip v. Gross on "flawed testimony from an FBI hair examiner", July 16, 2015, Forensic Science, Statistics & the Law

Letter from Cellmark Forensics to Robert L. Mink, June 5, 2015, available at http://courts.ms.gov/Images/Orders/dc00001_live.SCT.13.DR.491.35115.0.pdf.

Motion to Stay Execution and Set Aside Convictions, Second Motion for Leave to File Successive Petition for Post-Conviction Relief, and Motion in the Alternative for Other Forms of Relief, at 5, Manning v. Mississippi, No. 2013-DR-00491-SCT (May 6, 2013).

Kaye, supra note 1.

Motion, supra note 3.

Transcript at 1048.

Thursday, July 16, 2015

Justice Breyer in Glossip v. Gross on "flawed testimony from an FBI hair examiner"

Justice Stephen Breyer recently proposed that the Supreme Court reconsider the constitutionality of the death penalty. One of his concerns is that despite the level of scrutiny capital sentences are supposed to receive, innocent defendants may be executed. Justice Breyer's worry on this score cannot be dismissed as historically unfounded. Neither can it be written off as the predictable moaning of a bleeding-heart liberal, coming as it does from the "high court's raging pragmatist."

A related concern articulated by Justice Breyer is the "uncertainty as to whether a death sentence will in fact be carried out" and when that day will arrive. "Willie Manning," for example, "was four hours from his scheduled execution before the Mississippi Supreme Court stayed the execution. See Robertson, With Hours to Go, Execution is Postponed, N.Y. Times, Apr. 8, 2015, p. A17. Two years later, Manning was exonerated after the evidence against him, including flawed testimony from an FBI hair examiner, was severely undermined. Nave, Why Does the State Still Want to Kill Willie Jerome Manning? Jackson Free Press, Apr. 29, 2015."

Without questioning the main point about the uncertainty of confinement, I want to probe the use of the Manning case -- or rather cases -- as an exoneration in which "flawed testimony from an FBI hair examiner was severely undermined." Justice Breyer has been called a "technocrat," and it is not surprising that he would call attention to the hair evidence against Manning and to "the more general problem of flawed forensic testimony" as illustrated by reports that "FBI Testimony on Microscopic Hair Analysis Contained Errors in at Least 90 Percent of Cases in Ongoing Review," including capital ones like Manning's. But what was the erroneous hair evidence against Manning, and in what sense was it erroneous?

The answers are complicated, and it turns out that hair evidence played no role in the case in which Manning was exonerated. Manning was convicted in not one, but two, capital cases. The hair evidence applies to a different pair of murders for which Manning has yet to be exonerated, and it is hard to see how the hair testimony there has been "severely undermined." For clarity, I discuss the separate cases separately.

I. The Jimmerson-Jordan Murders

According to the article cited by Justice Breyer,

An Oktibbeha County jury convicted Manning for killing nonagenarian Emmoline Jimmerson and her daughter, Alberta Jordan, in the winter of 1992. The women were beaten and their throats slashed during an apparent robbery attempt at their Brookville Gardens apartment in Starkville. Manning was convicted of the crime at age 26 and sentenced to death.

The state's star witness, a man named Kevin Lucious, told police and later testified in court, that he saw Manning enter the victims' apartment from his own apartment, but police found the apartment where Lucious claimed to live was vacant at the time of the crime. The apartment manager also had no record of Lucious being a tenant.

Presiding Justice Michael K. Randolph, on behalf of the supreme court's majority, ordered the case back to circuit court for a new trial, agreeing with Manning's attorneys that "there is no question that defense counsel would have had the opportunity to meaningfully impeach Lucious' testimony that he lived in the apartment at the time of the crime and saw Manning enter the victims' apartment." ...

[Later], Luscious recanted most of his statements, saying he only testified because he feared being charged with the crime himself. ... Luscious claimed that he told Sheriff Bryan that another man, Tyrone Smith, had confessed to the murders. With the state's material witness now changing material parts of his story, the case had to be thrown out [rather than retried].

Neither this article nor the entry in the University of Michigan's National Registry of Exonerations refers to any forensic-science evidence in the case. The Registry lists the factors that contributed to the conviction as follows: "perjury or false accusation, official misconduct."

II. The Miller-Steckler Murders

In 1992, two Mississippi State University students, Tiffany Miller and Jon Steckler, who were dating, were shot and killed near the fraternity house in which one of them lived. Steckler's fraternity brother had a car that was burglarized at around the same time. The state produced evidence that Manning, who had a record of convictions for theft and other crimes, was distributing items stolen from the car. Manning conceded that he was selling stolen goods but said that he did not know who stole them. Among the other evidence introduced against Manning was hair found in Miller's car. The car was found near campus, and it had been used to run over Steckler.

An FBI criminalist testified that he could "microscopically determine if the hairs look alike and determine with some degree of certainty, although not absolutely, but with some degree of certainty if hairs, for example, found in vacuum sweepings from an automobile originated from a particularly named individual." Manning v. State, 726 So.2d 1152, 1180 (Miss. 1998). He also testified that hairs from the car "exhibited characteristics associated with the black race." Id. The examiner "went on to testify that as the hairs were only fragments, he could not compare the hairs to a known sample, and that he was limited to a determination as to the racial characteristics of the hair." Manning v. Epps, 695 F.Supp.2d 323, 380 (N.D. Miss. 2009). This determination was significant because the two victims were white, and Manning is black. Knowing that the hair had the latter "racial characteristics" made it more probable that it was Manning's and thus that he was in the car.

The DOJ issued not one, but two letters about the 1994 testimony of its agent. The first letter, dated May 2, 2013, reported that

the microscopic hair comparison analysis testimony or laboratory report presented in this case included statements that exceeded the limits of science and was, therefore, invalid. While this case did not involve a positive association of an evidentiary hair to an individual, the examiner stated or implied in a general explanation of microscopic hair comparison analysis that a questioned hair could be associated with a specific individual to the exclusion of all others -- this type of testimony exceeded the limits of the science. The examiner also assigned a statistical weight or probability or provided a likelihood that, through microscopic hair comparison analysis, the examiner could determine that a questioned hair originated from a particular source, or an opinion as to the likelihood or rareness of a positive association that could lead the jury to believe that valid statistical weight can be assigned to n microscopic hair association -- this type of testimony exceeded the limits of the science. (A copy of the documents upon which this determination was based is enclosed.)

I have not seen a copy of the testimony, but the letter does not suggest that the testimony used to link Manning (and all other African Americans) to the hair in the car was erroneous or undermined. Apparently, the criminalist overstated the power of hair features as the basis for a probabilistic or categorical statement that an individual was in fact the source of a hair, but as we saw, the expert here shied away from either of those statements. He testified only to the race of the unknown individual whose hair was in the car.

Two days later, the DOJ distributed a second letter referring to an "additional error." This letter addressed the racial identification. One might well wonder about the "racial characteristics" of hair. How definitive of race are these features? Can criminalists really zero in on African Americans this way?The letter had this to say:

We have determined that the microscopic hair comparison analysis testimony or laboratory report presented in this case included additional statements that exceeded the limits of science and was, therefore, invalid. In response to inquiries regarding whether the errors identified in the notification letter had any bearing on the examiner’s opinion regarding the racial classification of the hair, the FBI states the following: The scientific analysis of hair evidence permits an examiner to offer an opinion that a questioned hair possesses certain traits that are associated with a particular racial group. However, since a statistical probability cannot be determined for classification of hair into a particular racial group, it would be error for an examiner to testify that he can determine that the questioned hairs were from an individual of a particular racial group. Thus, an examiner cannot testify with any statement of probability whether the hair is from a particular racial group, but can testify that a hair exhibits traits associated with a particular racial group. (A copy of the FBI Microscopic Hair Analysis Report, dated May 4, 2013, is attached.)

This paragraph is hardly a repudiation of the testimony that the hairs from the car "exhibited characteristics associated with the black race." To the contrary, it endorses this testimony as a permissible "scientific analysis." (It is not clear to me that this assessment of the science is correct, but I have not researched the scientific literature.) What "would be error," in the DOJ's view, is "any statement of probability whether the hair is from a particular racial group." But it is impossible to tell from the letter whether the FBI agent gave any such testimony. The federal district court's opinion denying Manning's habeas corpus petition makes it sound like the testimony was not of the sort later deprecated by the FBI. Manning v. Epps, 695 F.Supp.2d 323, 380 (N.D. Miss. 2009).

The letter added that the FBI was prepared to perform DNA tests on the hairs or other biological material if desired. The Mississippi Supreme Court called off the execution to allow Manning "to proceed in the circuit court with his request for DNA testing ... ." Manning v. State, 119 So.3d 293, 293 (Miss. 2013). In 2015, Manning's lawyer "said several items have been sent to a lab in Houston, Texas, for analysis" and that "the timing of the testing and issuing of results is up to the lab and the FBI." 1/.

It is entirely possible that DNA testing will soon exonerate Manning in the Miller-Steckler murder case (to the extent of showing that the hair in the car was not his). As of May, 2015, this had not happened. 2/ If it does, it would mean that either the original determination of the racial characteristics of the hair was wrong -- something that the FBI has not conceded -- or that the determination was correct but that Manning was not the person whose hairs were in the car -- something that the criminalist never purported to resolve.

NOTES

R. I. Nave, Why Does the State Still Want to Kill Willie Jerome Manning? Jackson Free Press, Apr. 29, 2015, available at http://www.jacksonfreepress.com/news/2015/apr/29/why-does-state-still-want-kill-willie-jerome-manni/.

Maurice Possley, Willie Manning, Nat'l Registry of Exonerations, https://www.law.umich.edu/special/exoneration/pages/casedetail.aspx?caseid=4679 ("Manning remains on Mississippi’s Death Row for the Miller-Steckler murders as the physical evidence in that case was still undergoing DNA testing as of April 2015.").

RELATED POSTINGS

What the FBI Hair Examiner Said About Race in State v. Manning, Forensic Science, Statistics & the Law, July 19, 2015
Validity, Overclaiming, and Error: More on Willie Manning's Exoneration, Forensic Science, Statistics & the Law, July 17, 2015
No Relief for Jeffrey MacDonald After FBI Declares It “Exceeded the Limits of Science” with Hair Analysis, Forensic Science, Statistics & the Law, May 23, 2015

The FBI's Worst Hair Days, Forensic Science, Statistics & the Law, July 31, 2014

Tuesday, July 14, 2015

Two Scientific Issues in Glossip v. Gross

In Glossip v. Gross, the Supreme Court narrowly upheld a district court's refusal to issue a preliminary injunction against a three-drug protocol being used to kill capital offenders. The majority of five Justices, led by Justice Alito, was emphatic. Their opinion offered not one, but two "independent reasons." The second was that the district court's findings about the effect and dosage of the sedative midazolam were not clearly erroneous. Although the Court cautioned that "federal courts should not embroil themselves in ongoing scientific controversies beyond their expertise" (internal quotation marks omitted), the factual question that ultimately must be answered correctly is whether this sedative, administered to achieve deep unconsciousness, really blocks the pain caused by paralytic and heart-stopping agents. The opinion does not resolve this question, and even if the forgiving "clearly erroneous" standard was met in this case, Justice Sotomayor's opinion for the four dissenting Justices contains more than enough technical material to make one nervous about the conclusion that midazolam works as the state of Oklahoma hopes it does.

A second controversy involves social, not medical, science. Justice Breyer, joined by Justice Ginsburg), wanted the Court to ask for "full briefing on ... whether the death penalty violates the Constitution." He maintained that "the death penalty, in and of itself, now likely constitutes a legally prohibited 'cruel and unusual punishmen[t].'" One aspect of this tentative conclusion involved "the death penalty's deterrent effect" -- the subject of innumerable studies and two skeptical reports from the National Academy of Sciences.

Justice Scalia, joined by Justice Thomas, was upset at what he called the "speculat[ion] that it does not 'seem likely' that the death penalty has a 'significant' deterrent effect." His approach to this empirical question was quintessentially legal, not scientific -- pick the answer you want (or think you know) and look only for confirming evidence. Justice Scalia's argument that the existence of the death penalty (even if rarely used and long delayed) deters significantly more than life imprisonment does consists of a single sentence: "It seems very likely to me, and there are statistical studies that say so." What studies?

See, e.g., Zimmerman, State Executions, Deterrence, and the Incidence of Murder, 7 J. Applied Econ. 163, 166 (2004) (“[I]t is estimated that each state execution deters approximately fourteen murders per year on average”); Dezhbakhsh, Rubin, & Shepherd, Does Capital Punishment Have a Deterrent Effect? New Evidence from Postmoratorium Panel Data, 5 Am. L. & Econ. Rev. 344 (2003) (“[E]ach execution results, on average, in eighteen fewer murders” per year); Sunstein & Vermeule, Is Capital Punishment Morally Required? Acts, Omissions, and Life-Life Tradeoffs, 58 Stan. L. Rev. 703, 713 (2005) (“All in all, the recent evidence of a deterrent effect from capital punishment seems impressive, especially in light of its ‘apparent power and unanimity’”).

Justice Scalia would have done better to have followed the example of Andrew Lang, the novelist who tried "not to use statistics as a drunken man uses lamp-posts, for support rather than for illumination." 1/ Carelessly or cavalierly, Justice "Scalia cites a paper by Cass Sunstein for a second time, even though after the first such Scalia citation in an earlier lethal injection case, Cass Sunstein (writing with Justin Wolfers) affirmed his view that there is no credible evidence that the death penalty is a deterrent." 2/ As Professor John Donahue, who penned these words, added, "[o]ne would hope for more from a Supreme Court justice than citations to junk science and to a paper withdrawn based on more informed consideration – especially on a matter of life and death." 3/

NOTES

On the origin of this aphorism, see Quote Investigator, Jan. 14, 2015, http://quoteinvestigator.com/2014/01/15/stats-drunk/.

John J. Donohue, Glossip v. Gross: Examining Death Penalty Data for Clarity, Stanford Lawyer, June 29, 2015, https://stanfordlawyer.law.stanford.edu/2015/06/glossip-v-gross-examining-death-penalty-data-for-clarity/. Justice Breyer is both less dogmatic and more complete in his description of the body of social science research:
Many studies have examined the death penalty's deterrent effect; some have found such an effect, whereas others have found a lack of evidence that it deters crime. [Citations omitted.] Recently, the National Research Council ... reviewed 30 years of empirical evidence and concluded that it was insufficient to establish a deterrent effect and thus should "not be used to inform" discussion about the deterrent value of the death penalty. National Research Council, Deterrence and the Death Penalty 2 (D. Nagin & J. Pepper eds. 2012).

I recognize that a 'lack of evidence' for a proposition does not prove the contrary. [Citation omitted.] But suppose that we add to these studies the fact that, today, very few of those sentenced to death are actually executed, and that even those executions occur, on average, after nearly two decades on death row. ... Then, does it still seem likely that the death penalty has a significant deterrent effect?

Forensic Science, Statistics & the Law

Pages

Friday, November 23, 2018

Muddling Through the Measurement of IQ

Wednesday, March 29, 2017

After Moore v. Texas Is a Single IQ Score Really Determinative?

Sunday, July 19, 2015

What the FBI Hair Examiner Said About Race in State v. Manning

Thursday, July 16, 2015

Justice Breyer in Glossip v. Gross on "flawed testimony from an FBI hair examiner"

Tuesday, July 14, 2015

Two Scientific Issues in Glossip v. Gross

Labels

Popular Posts

Search This Blog

Blog Archive

Places to visit, books to read, meetings to attend [or to avoid]