Saturday, March 30, 2019

Can P-values or Confidence Intervals Prove Non-association?

Last week, the scientific community heard two prominent calls to end to the labeling of results of statistical studies as statistically "significant" or "not significant." One of the manifestos (Amrhein et al.) is particularly irate about negative claims -- conclusions of "no association" based on sample data that are not extreme enough to reject the null hypothesis. The authors "are frankly sick of seeing such nonsensical 'proofs of the null' and claims of non-association in presentations, research articles, reviews and instructional materials." The "pervasive problem," as they articulate it, is in the box on the right. It is a recurring issue in toxic tort an other litigation.
PERVASIVE PROBLEM
Let’s be clear about what must stop: we should never conclude there is ‘no difference’ or ‘no association’ just because a P value is larger than a threshold such as 0.05 or, equivalently, because a confidence interval includes zero. Neither should we conclude that two studies conflict because one had a statistically significant result and the other did not. These errors waste research efforts and misinform policy decisions. 1/
It is hard to disagree with the first observation in the box. Logically, we cannot conclude that the null hypothesis is true "just because a P value is larger than a threshold." For one thing, if the study lacks power, random error plausibly could be the source of the observed difference. However, this is not a sufficient reason to abjure hypothesis tests. It is a reason to attend to power. If a study has ample power and is well designed, then it is not so obvious that it wastes research efforts or misinforms policy decisions to conclude that the failure to achieve statistical significance at a rather undemanding level demonstrates the lack of a true association.

The second observation also is literally correct. Different outcomes of a null hypothesis significance test are not logically sufficient to establish a conflict between the studies. In 2013, a paper in the International Journal of Cardiology reported "that the use of selective COX-2 inhibitors [such as Vioxx] was not associated with atrial fibrillation risk" because the researchers did not find a statistically significant association. 2/ But a 2011 study in the British Medical Journal had found just such an association. 3/ It might seem that the second study is in conflict with the first, but as Amrhein et al. explain:
The researchers describing their statistically non-significant results found a risk ratio of 1.2 (that is, a 20% greater risk in exposed patients relative to unexposed ones). They also found a 95% confidence interval that spanned everything from a trifling risk decrease of 3% to a considerable risk increase of 48% (P = 0.091; our calculation). The researchers from the earlier, statistically significant, study found the exact same risk ratio of 1.2. That study was simply more precise, with an interval spanning from 9% to 33% greater risk (P = 0.0003; our calculation).

It is ludicrous to conclude that the statistically non-significant results showed “no association”, when the interval estimate included serious risk increases; it is equally absurd to claim these results were in contrast with the earlier results showing an identical observed effect. Yet these common practices show how reliance on thresholds of statistical significance can mislead us ... . 4/
So the results of the second study do not undermine those of the first. The studies provide consistent point estimates. The second one merely has a smaller standard error. As such, it does not merit as much weight, but it is fully consistent with the first result. 5/

What happens, however, when the point estimates of the relative risk are different? Is "no association" a reasonable statement when the confidence interval is tightly centered around RR = 1? Suppose, for example, that the second study produced a 95% confidence interval of RR = 1.0 ± 0.2. From a maximum likelihood standpoint, "no association" is the best estimate. Would it still be ludicrous to describe the observed RR of 1 (no difference in the risk) as proof of "no association"?

Amrhein et al. do not answer this specific question, but they grudgingly accept the idea of proof of "no association" in the form of no practical importance. As stated at the outset, they are "frankly sick of seeing such nonsensical 'proofs of the null' ... ." They argue that all values inside a confidence interval are "compatible" with the data (as are some outside of the interval), implying that you cannot single out any one value to the exclusion of the others. But they also propose that some values are more compatible than others. Is compatibility a likelihood? A probability? Something else? In the end, their answer to the question of whether the data can be said to prove a negative -- that there is no true association -- is this: "if you deem all of the values inside the interval to be practically unimportant, you might then be able to say something like ‘our results are most compatible with no important effect’." 6/

NOTES
  1. Valentin Amrhein, Sander Greenland, Blake McShane et al., Comment, Retire Statistical Significance, 567 Nature 305, 305-06 (2019).
  2. T.-F. Chao, C.-J. Liu, S.-J. Chen, K.-L. Wang, Y.-J. Lin, S.-L. Chang, et al., The Association Between the Use of Non-steroidal Anti-inflammatory Drugs and Atrial Fibrillation: a Nationwide Case–control Study, 168 Int’l J. Cardiology 312 (2013).
  3. M. Schmidt, C.F. Christiansen, F. Mehnert, K.J. Rothman, & H.T. Sørensen, Non-steroidal Anti-inflammatory Drug Use and Risk of Atrial Fibrillation or Flutter: Population Based Case-control Study, 343 Brit. Med. J. d3450 (2011).
  4. Amrhein, supra note 1, at 306..
  5. In fact, it lends strength to the conclusion that the true relative risk exceeds 1. Morten Schmidt & Kenneth J. Rothman, Mistaken Inference Caused by Reliance on and Misinterpretation of a Significance Test, 177(3) Int'l J. Cardiology 1089, 1090 (2014) (meta-analysis give a CI of 1.1 to 1.3).
  6. Amrhein, supra note 1, at 307.

No comments:

Post a Comment