Forensic Science, Statistics & the Law: Quarreling and Quibbling over Psychometrics in Hall v. Florida (part 3)

After explaining that Florida’s statutory cutoff of –2σ_x corresponds to an IQ score of 70 (because IQ tests are normed to have a mean of 100 and a standard deviation of 15), Justice Kennedy observes that:

Florida's rule disregards established medical practice in two interrelated ways. It takes an IQ score as final and conclusive evidence of a defendant's intellectual capacity, when experts in the field would consider other evidence. It also relies on a purportedly scientific measurement of the defendant's abilities, his IQ score, while refusing to recognize that the score is, on its own terms, imprecise.

Here, I show that these two limitations on IQ scores are less “interrelated” than Justice Kennedy suggests.

The first issue: validity

The first limitation involves what social scientists call “validity”—the extent to which something measures the real quantity of interest. For example, measuring the volume of a box by attending only to one dimension, such as height, is invalid because it ignores the two other determinative variables of width and depth.

The second limitation concerns the precision or “reliabilility” of the measurement—regardless of validity. Being able to measure the height of a box to the nearest millimeter time after time achieves precision and reliability, but it still lacks validity (with respect to the variable of volume). Moreover, measuring width or depth—even somewhat imprecisely—will add to validity but will do nothing to enhance the precision of the measurement of height.

Likewise, the “other evidence” to which the Court referred does not make IQ scores any more precise. Rather, it relates to what is known in the trade as “adaptive functioning.” The opinion defines adaptive functioning as “the inability to learn basic skills and adjust behavior to changing circumstances.” The Court disparages the “mandatory cutoff” of –2σ_x because this cut-score means that

sentencing courts cannot consider even substantial and weighty evidence of intellectual disability as measured and made manifest by the defendant's failure or inability to adapt to his social and cultural environment, including medical histories, behavioral records, school tests and reports, and testimony regarding past behavior and family circumstances. This is so even though the medical community accepts that all of this evidence can be probative of intellectual disability, including for individuals who have an IQ test score above 70.

If one were to follow this we-need-another-variable theory of “intellectual disability” to its logical limit, no IQ score could preclude the more comprehensive assay of all forms of “substantial and weighty evidence of intellectual disability.” An individual with above-average IQ scores also might “manifest [a] failure or inability to adapt to his social and cultural environment [as shown by] medical histories, behavioral records, school tests and reports, and testimony regarding past behavior and family circumstances.”

But surely the Court cannot claim that the execution of intellectually gifted but maladapted criminals is cruel and unusual while the execution of intellectually gifted and socially well adjusted criminals is not. To avoid such anomalies, the Court follows the contemporary (and prior) mental health practice of limiting “intellectual disability” to “concurrent deficits in intellectual and adaptive functioning” (emphasis added), which requires “significantly subaverage intellectual functioning” in addition to mere “deficits in adaptive functioning.” If it is clear that an individual is able to function intellectually within a broad but “normal” range, then a state need not entertain a claim of “intellectual disability” based solely on problems in adaptive functioning. Therefore, an IQ within the normal range should suffice to displace the offender from the potentially death-disqualified group.

So the possibility of weighty evidence of deficits in adaptive functioning, although relevant to clinicians, turns out to be no explanation for why Florida cannot draw the line at –2σ_x. If evidence of an adaptive-function deficit does not bar the state from executing criminals with IQs in the broad range of normalcy, why cannot the state define all IQs above 70 (–2σ_x) as lying within that range? That “experts in the field would consider other evidence” than IQ scores is not an answer. The answer has to be that (1) there is a range above which IQ, in of of itself, is a valid measure of the absence of “intellectual disability,” and (2) this range does not extend all the way down to 70. When these conditions hold, experts would not (or need not) consider “other evidence.”

Ironically, the Court’s opinion contradicts the second proposition. It clearly implies that the state could use a perfectly precise IQ measurement just above –2σ_x as conclusive evidence of intellectual disability. But if that is so, then the problem is not the failure to allow evidence of adaptive functioning. It is solely the existence of nonzero measurement error of IQ alone.

The second issue: precision (reliability)

Apparently (and dubiously) reserving the term “scientific” for precise measurements, Justice Kennedy stated that the “purportedly scientific measurement of the defendant's abilities, his IQ score, ... is, on its own terms, imprecise.” The problem is that although “there is evidence that Florida's Legislature intended to include the measurement error in the calculation ... the Florida Supreme Court ... has held that a person whose test score is above 70, including a score within the margin for measurement error, does not have an intellectual disability ... .”

In other words, a legislature that wants to preclude the more elaborate evaluations of all offenders with IQ scores below 70 could do so if only it had a way to measure IQs with perfect accuracy. Because of the “measurement error” of IQ tests, this legislature must adopt a higher cutoff. The Court, relying on the diagnostic literature, repeatedly refers to a cutoff of 75 as assuring an adequate safety margin.

The dissent had harsh words for the choice of 75, and I will get to those later, after examining where the figure of 75 comes from. At this point, no excursion into statistical theory is required to recognize that there is something weird about saying that IQ scores are problematic because they are an incomplete measure of “intellectual disability,” but then using them—and only them—within a band that accounts only for the error in measuring IQ. By definition, this band does not attend to the other factors that should be part of the full analysis. To put it another way, if the problem lies with using IQ alone, the solution lies in defining the range of IQ scores in which the other factors realistically could produce a different diagnosis. However, the error in IQ measurements has no clear connection to the range in which the failure to look beyond IQ makes a difference.

The majority’s response is essentially that if the mental health profession generally agrees that incompleteness is only a significant concern within the logically unrelated range of IQ-score error, then that is all that the Cruel and Unusual Punishment Clause demands. To which the dissent replies that abdicating the line drawing to the professionals makes no constitutional sense and “will also lead to serious practical problems.”

The dissent’s peculiar proof of “instability”

The first such problem is “instability.” According to Justice Alito:

This danger is dramatically illustrated by the most recent publication of the APA, on which the Court relies. This publication fundamentally alters the first prong of the longstanding, two-pronged definition of intellectual disability that was embraced by Atkins and has been adopted by most States. In this new publication, the APA discards “significantly subaverage intellectual functioning” as an element of the intellectual-disability test. Elevating the APA's current views to constitutional significance therefore throws into question the basic approach that Atkins approved and that most of the States have followed. ^1/

The American Psychiatric Association’s latest version of its venerable Diagnostic and Statistical Manual of Mental Disorders—the DSM-5—“was published in May 2013 amid a storm of controversy and bitter criticism.” ^2/ In general, critics maintain that “D.S.M.’s diagnostic categories lacked validity, that they were not ‘based on any objective measures,’ and that, ‘unlike our definitions of ischemic heart disease, lymphoma or AIDS,’ which are grounded in biology, they were nothing more than constructs put together by committees of experts.” 3/ Neither opinion even hints at such turmoil. The majority genuflects to clinical expertise and guidelines. The dissent raises no questions about validity and subjectivity, but objects to substituting “the standards of professional associations, which at best represent the views of a small professional elite” for “the standards of the American people.”

As for “instability,” the DSM-5 has brought a profusion of new or redefined disorders, but it does not radically change the definition of “intellectual disability” or dispense with the criterion of “significantly subaverage intellectual functioning.” It simply substitutes the word “intellectual ... deficit” for “significantly subaverage.” The diagnostic criteria have remained remarkably similar over the 19 years between the DSM-4 and the DSM-5.

The DSM-5 specifies that “[t]he first diagnostic criterion that “must be met” is “A. Deficits in intellectual functions ... confirmed by ... both clinical assessment and standardized intelligence testing.” If the intelligence testing does not demonstrate subaverage performance, it is hard to see how it could confirm the existence of a meaningful deficit. Moreover, the DSM-5 elaborates, making it plain that significantly subaverage IQ remains a sine qua non for the diagnosis:

The essential features ... are deficits in general mental abilities (Criterion A) and impairment in everyday adaptive functioning ... (Criterion B) [with o]nset is during the developmental period (Criterion C). The diagnosis of ... is based on both clinical assessment and standardized testing ... . Intellectual functioning is typically measured with ... tests of intelligence. Individuals with intellectual disability have scores of approximately two standard deviations or more below the population mean, including a margin for measurement error (generally +5 points). On tests with a standard deviation of 15 and a mean of 100, this involves a score of 65–75 (70 ± 5).

Compare this to the DSM-4 (or the DSM-4-TR cited by Justice Alito, which uses the same words):

The essential feature of Mental Retardation is significantly subaverage general intellectual functioning (Criterion A) that is accompanied by significant limitations in adaptive functioning ... (Criterion B) [with] onset ... before age 18 years (Criterion C). ... General intellectual functioning is defined by the intelligence quotient ... obtained by assessment with ... intelligence tests ... . Significantly subaverage intellectual functioning is defined as an IQ of about 70 or below (approximately 2 standard deviations below the mean). It should be noted that there is a measurement error of approximately 5 points in assessing IQ, although this may vary from instrument to instrument ... . Thus, it is possible to diagnose Mental Retardation in individuals with IQs between 70 and 75 who exhibit significant deficits in adaptive behavior. Conversely, Mental Retardation would not be diagnosed in an individual with an IQ lower than 70 if there are no significant deficits or impairments in adaptive functioning.

Thus, there are wording changes over the 19 years from 1994 to 2013, but Criterion A remains Criterion A, IQ tests remain critical to the diagnosis, and the range of test scores that lend themselves to the diagnosis is the same. The APA has changed the emphasis somewhat, and it has spelled out the constructs a little more (in words not quoted here). Nevertheless, to claim that the shift “dramatically illustrate[s a] fundamental[] alter[ation in] ... the longstanding ... definition of intellectual disability” seems, well, melodramatic.

State laws that rely on –2σ_x plus a margin of safety for measurement error, are compatible with Atkins, Hall, DSM-4, and DSM-5. Of course, whether this is a logically or functionally appropriate manner of defining “intellectual disability” for purposes of capital punishment is open to debate. Resolving this debate requires a more detailed and accurate understanding of the concept of measurement error than the Hall opinions provide.

Footnotes

The second problem is that “changes adopted by professional associations are sometimes rescinded.” This problem is just a form of instability. The third problem is hypothetical (thus far) as it relates to intellectual disability determinations: “what if professional organizations disagree? The Court provides no guidance for deciding which organizations' views should govern.” The fourth and final “practical problem” is actually conceptual—and quite important. “[D]efinitions of intellectual disability ... are promulgated for use in making a variety of decisions that are quite different from the decision whether the imposition of a death sentence in a particular case would serve a valid penological end. ... [I]n determining eligibility for social services, adaptive functioning may be much more important.”
Nat’l Health Service Choices, Controversy over DSM-5: New Mental Health Guide, Aug. 15, 2013.
Gary Greenberg, The Rats of N.I.M.H., New Yorker, May 16, 2013 (quoting Thomas Insel, the director of the National Institute of Mental Health).

POSTINGS ON IQ SCORES AND CAPITAL PUNISHMENT

Quarreling and Quibbling over Psychometrics in Hall v. Florida (part 1), May 29, 2014 (introduction)
Quarreling and Quibbling over Psychometrics in Hall v. Florida (part 2), June 2, 2014 (on standard deviation)
Quarreling and Quibbling over Psychometrics in Hall v. Florida (part 3), June 4, 2014 (on validity and the stability of the APA's diagnostic criteria)
After Moore v. Texas Is a Single IQ Score Really Determinative?, March 29, 2017
Muddling Through the Measurement of IQ, November 23, 2018

Forensic Science, Statistics & the Law

Pages

Wednesday, June 4, 2014

Quarreling and Quibbling over Psychometrics in Hall v. Florida (part 3)

No comments:

Post a Comment

Labels

Popular Posts

Search This Blog

Blog Archive

Places to visit, books to read, meetings to attend [or to avoid]