The issue before the Court was “[w]hether a plaintiff can state a claim under § 10(b) of the Securities Exchange Act and SEC Rule 10b-5 based on a pharmaceutical company's nondisclosure of adverse event reports even though the reports are not alleged to be statistically significant.”  In the case, the manufacturer of the Zicam nasal spray for colds issued reassuring press releases at a time when it was receiving case reports from physicians of loss of smell (anosmia) in Zicam users. The pharmaceutical company, Matrixx Initiatives, succeeded in getting a security fraud class action dismissed on the ground that the plaintiffs failed to plead “statistical significance.”
Because case reports are just a series of anecdotes, it is not immediately obvious how they could be statistically significant, but a determined statistician could compare the number of reports in the relevant time period to the number that would be expected under some model of the world in which Zicam is neither a cause nor a correlate of anosmia. If the observed number departed from the expected number by a large enough amount—one that would occur no more than about 5% of the time when the assumption of no association is true (along with all the other features of the model)—then the observed number would be statistically significant at the 0.05 level.
The Court rejected any rule that would require securities-fraud plaintiffs to engage in such statistical modeling or computation before filing a complaint. This result makes sense because a reasonable investor might want to know about case reports that do not cross the line for significance. Such anecdotal evidence could be an impetus for further research, FDA action, or product liability claims—any of which could affect the value of the stock. In rejecting a bright-line rule of p < 0.05, the Court made several peculiar statements about statistical significance and the design of studies, but these are not my subject for today. (An older posting, on March 25, has some comments on this issue.)
Instead, I want to look at a small part of an amicus brief from “statistics experts” filed on behalf of the plaintiffs. There is much in this brief, which really comes from two economists (or perhaps these eclectic scholars should be designated historians or philosophers of economics and statistics), with which I would agree (for whatever my agreement is worth). But I was shocked to find the following text in the “Brief of Amici Curiae Statistics Experts Professors Deirdre N. McCloskey and Stephen T. Ziliak in Support of Respondents”:
The 5 percent significance rule insists on 19 to 1 odds that the measured effect is real.26 There is, however, a practical need to keep wide latitude in the odds of uncovering a real effect, which would therefore eschew any bright-line standard of significance. Suppose that a p-value for a particular test comes in at 9 percent. Should this p-value be considered “insignificant” in practical, human, or economic terms? We respectfully answer, “No.” For a p-value of .09, the odds of observing the AER [adverse event report] is 91 percent divided by 9 percent. Put differently, there are 10-to-1 odds that the adverse effect is “real” (or about a 1 in 10 chance that it is not). Odds of 10-to-1 certainly deserve the attention of responsible parties if the effect in question is a terrible event. Sometimes odds as low as, say, 1.5-to-1 might be relevant.27 For example, in the case of the Space Shuttle Challenger disaster, the odds were thought to be extremely low that its O-rings would fail. Moreover, the Vioxx matter discussed above provides an additional example. There, the p-value in question was roughly 0.2,28 which equates to odds of 4 to 1 that the measured effect — that is, that Vioxx resulted in increased risk of heart-related adverse events — was real. The study in question rejected these odds as insignificant, a decision that was proven to be incorrect.Why is this explanation out of whack? The fundamental problem is that, within the framework of classical (Neyman-Pearson) hypothesis testing, hypotheses like “the adverse effect is real” or “a measured effect being real” do not have odds or probabilities attached to them. In Bayesian inference, statements like “the probability that the measured effect is ‘real’ is 95 percent, whereas the probability that it is false is 5 percent” are meaningful, but frequentist p-values play no role in that framework. Equating the p-value with the probability that a null hypothesis is true and regarding the complement of a p-value as the probability that the alternative hypothesis is true (that something is “real”) is known as the transposition fallacy.  That two “statistics experts” would rely on this crude reasoning to make an otherwise reasonable point is depressing.
26. At a 5 percent p-value, the probability that the measured effect is “real” is 95 percent, whereas the probability that it is false is 5 percent. Therefore, 95 / 5 equals 19, meaning that the odds of finding a “real” effect are 19 to 1.
27. Odds of 1.5 to 1 correspond to a p-value of 0.4. That is, the odds of the measured effect being real would be 0.6 / 0.4, or 1.5 to 1.
28. Lisse et al., supra note 14, at 543-44.
The preceding paragraph is a little technical. Soon, I shall post a simple example that should make the point more concretely and with less jargon.
1. Transcript of Oral Argument, Matrixx Initiatives, Inc. v. Siracusano, 131 S.Ct. 1309 (2011) (No. 09-1156), 2011 WL 65028, at *12 & *16 (Kagan, J.).
2. Petition for Writ of Certiorari at i, Matrixx Initiatives, Inc. v. Siracusano, 131 S.Ct. 1309 (2011) (No. 09-1156), 2010 WL 1063936.
3. David H. Kaye, David E. Bernstein & Jennifer L. Mnookin, The New Wigmore: A Treatise on Evidence: Expert Evidence (2d ed. 2011).