Wednesday, November 20, 2013

The Significance of Significance

Today’s issue of Nature includes a cautionary essay entitled “Twenty Tips for Interpreting Scientific Claims.”  The essay is written by two conservation biologists (William J. Sutherland and Mark Burgman) and one statistician (David Spiegelhalter)—all eminent in their fields. The essay lists “20 concepts that should be part of the education of civil servants, politicians, policy advisers and journalists—and anyone else who may have to interact with science or scientists.”

Increasing statistical and scientific literacy is a laudable goal, but it is not at all easy to achieve. The late David Freedman and I struggled to describe some 16 of the 20 concepts for judges in a reference manual for judges. By and large, our expositions are consistent with the short ones in Twenty Tips, but two tips seem less useful than others.

First, how many policy makers, journalists, or other consumers of scientific information need to be told that smaller samples tend to be less representative (assuming that everything else is the same)?    Twenty Tips seems to suggest that sample size usually should be on the order of "tens of thousands":
Bigger is usually better for sample size. The average taken from a large number of observations will usually be more informative than the average taken from a smaller number of observations. That is, as we accumulate evidence, our knowledge improves. This is especially important when studies are clouded by substantial amounts of natural variation and measurement error. Thus, the effectiveness of a drug treatment will vary naturally between subjects. Its average efficacy can be more reliably and accurately estimated from a trial with tens of thousands of participants than from one with hundreds.
Nobody can dispute the truth of the bolded heading if “better” means more likely to produce an estimate of a population parameter that is close to its true value. The problem I have seen with judges, however, is not that they do not appreciate that large samples usually are preferable to small ones when accuracy is the only criterion of what is “better.”  It is that they are overly impressed with the perceived need for very large samples when smaller ones would be quite satisfactory. They do not recognize that doubling the sample size rarely doubles the precision of an estimate. They think a fixed percentage of large population needs to be sampled to obtain a good estimate.

Although this reaction merely concerns the understandable incompleteness of a short tip, the second tip I will mention contains more of an invitation to misunderstanding or misinterpretation. According to Twenty Tips
Significance is significant. Expressed as P, statistical significance is a measure of how likely a result is to occur by chance. Thus P = 0.01 means there is a 1-in-100 probability that what looks like an effect of the treatment could have occurred randomly, and in truth there was no effect at all. Typically, scientists report results as significant when the P-value of the test is less than 0.05 (1 in 20).
The explanation of this call for “significant" results invites confusion. First “statistical significance” is not “expressed as P.” Rather a P-value is (arbitrarily) translated into a yes-no statement of “significance.” Second, “P = 0.01” does not mean “there is a 1-in-100 probability that . . . in truth there was no effect at all.” It means that if “in truth there was no effect at all,” differences denominated “significant” at the 0.01 level would be seen about 1 time in 100 in a large number of repeated experiments.

I am being picky, but that comes from being a lawyer who worries about the choice of words. The paragraph on the significance of significance certainly could be read more charitably, but I suspect that the policy-makers it is intended to educate easily could misunderstand it. Indeed, judicial opinions are replete with transpositions of the P-value into posterior probabilities, and Twenty Tips offers little immunity against this common mistake.


David H. Kaye & David A. Freedman, Reference Guide on Statistics, in Reference Manual on Scientific Evidence, National Academy Press, 3d ed., 2011, pp. 211-302; Federal Judicial Center, 2d ed., 2000, pp. 83-178; Federal Judicial Center, 1st ed., 1994, pp. 331-414

William J. Sutherland, David Spiegelhalter & Mark Burgman, Policy: Twenty Tips for Interpreting Scientific Claims, Nature, Nov. 20, 2013,

No comments:

Post a Comment