Monday, June 2, 2014

Quarreling and Quibbling over Psychometrics in Hall v. Florida (part 2)

Justice Kennedy described the Florida law that prompted the trial court to reject Hall’s claim of intellectual disability as follows:
Florida's statute defines intellectual disability for purposes of an Atkins proceeding as “significantly subaverage general intellectual functioning existing concurrently with deficits in adaptive behavior and manifested during the period from conception to age 18.” Fla. Stat. § 921.137(1) (2013). The statute further defines “significantly subaverage general intellectual functioning” as “performance that is two or more standard deviations from the mean score on a standardized intelligence test.” Ibid. The mean IQ test score is 100. The concept of standard deviation describes how scores are dispersed in a population. Standard deviation is distinct from standard error of measurement, a concept which describes the reliability of a test and is discussed further below. The standard deviation on an IQ test is approximately 15 points, and so two standard deviations is approximately 30 points. Thus a test taker who performs “two or more standard deviations from the mean” will score approximately 30 points below the mean on an IQ test, i.e., a score of approximately 70 points.
Because standard deviations are fundamental to the Florida law, to the Court’s conclusions about it, and to its dicta regarding the lowest mandatory IQ cut-off that a state can use, I am going to be persnickety in unpacking this paragraph.

Although the Court is only discussing the standard deviation of IQ scores, “standard deviation” (SD) has a much broader meaning. When one considers the general meaning of the term, it becomes clear that the SD of the scores does not quite describe “how scores are dispersed in a population.” It merely indicates how much they are dispersed in either a population or a sample.

For example, the trial court heard testimony about at least four IQ test scores for Hall—71, 72, 73, and 80. (It declined to consider a score of 69 on another test because the psychologist who administered and scored the test was dead and Hall’s counsel had violated an order “to provide the State with the [underlying] testing materials and raw data.” Indeed, Hall had taken as many as nine IQ tests over a 40-year period.)  The standard deviation of the scores considered by the trial court is the square root of the average squared deviation from the mean—namely,

SD = {[(71–74)2 + (72–74)2 + (73–74)2 + (80–74)2]/4}1/2 = 3.53.

Tossing in the excluded score of 69 increases the SD to 3.74. The SD increases because the additional score is below the range of the other four, thus creating more variability in the sample (and lowering the mean from 74 to 73).

Of course, the Court’s number of 15 for the SD of IQ scores does not come from Hall’s scores. At this point, I use his scores only to elaborate on the Court’s observation that a standard deviation is a statistic that indicates how much the numbers in some set of numbers fluctuate around their mean. The standard deviation of 15 IQ points is an estimate of how much the scores of everyone in the general population—a large batch of numbers indeed—would vary if everyone took the test. The average score would be approximately 100, and there would be a lot of scatter around this mean. (In fact, the raw scores on the test are transformed in light of their mean and SD to force them to have a desired mean and SD near 100 and 15, respectively.) And, yes, 100 – (2×15) = 70, so Florida’s choice of 2 SDs to demarcate “significantly subaverage general intellectual functioning” translates into a score of 70 on a test with this mean and SD.

But the fact that every batch of numbers has a SD does not tell us “how [these numbers] are dispersed.” The numbers could be highly concentrated around a single value, with outliers on the flanks. Their distribution could be flat, with an equal fraction of the numbers spread out everywhere. The distribution might show clustering at several locations, and so on.

IQ scores, however, are dispersed approximately according to a “normal” or “Gaussian” curve. This distribution is the bell-shaped one prominent in elementary statistics courses. There are other bell-shaped curves, and all kinds of other interesting and important families of curves, but IQ scores, like many physical variables (such as weight and height), tend to be normally distributed across the members of a population (and hence in representative samples of that population).

The exact shape of all such normal distributions can be determined from two numbers—the mean and the standard deviation. The mean states where the bell sits, and the standard deviation determines how steeply its sides flow down from the top.You can see for yourself by entering your favorite means and standard deviations into the demonstration program in the OnlineStatBook.

Using the variable X to denote IQ scores and the symbol σx to designate their standard deviation, the particular normal distribution used in the Court’s calculation is such that, 2.28% of the scores lie below 70 (which, as the Court calculated it, corresponds to –2σx), and 4.75% fall below 75 (which, for the mean of 100 and standard deviation σx of 15, corresponds to –1.67σx). The latter IQ score, x = 75, is significant because Hall conceded (and the Court seemed to agree) that Florida could have chosen this score as its cut-off. For example, the Court expressed dissatisfaction that, in light of its calculations, the effect of Florida’s cut-off of –2σx was to preclude legally effective “professional[] diagnose[s of] intellectual disability [in a case like Hall’s, for which] the individual's IQ score is 75 or below.”

The dissent insisted that states should have more discretion to set cut-off scores. Unless –1.67σx (or 75) corresponds to the level of impairment that justifies a categorical rule, the majority has no satisfying reason to select one cut-off over the other. Why is the Court’s choice of 1.67 standard deviations below the mean the highest that the Constitution permits? Why is Florida’s two-standard-deviation rule insufficient?

The Court’s answer leans heavily on the standard error of measurement — another technical term that appears in the paragraph quoted above: “Standard deviation is distinct from standard error of measurement, a concept which describes the reliability of a test and is discussed further below.” But the standard error of measurement is also a standard deviation, one that is estimated, almost magically, from test reliability statistics. Thus, a more precise sentence would have been: “The standard deviation of all test scores is distinct from another standard deviation known as the standard error of measurement, which depends on the reliability of the test. We discuss the standard error of measurement below.”

Other postings in this series

  • Quarreling and Quibbling over Psychometrics in Hall v. Florida (part 1), May 29, 2014 (introduction)
  •  Quarreling and Quibbling over Psychometrics in Hall v. Florida (part 2), June 2, 2014 (on standard deviation)
  • Quarreling and Quibbling over Psychometrics in Hall v. Florida (part 3), June 4, 2014 (on validity and the stability of the APA's diagnostic criteria)

No comments:

Post a Comment