Monday, September 17, 2018

P-values in "the home for big data in biology"

A p-value indicates the significance of the difference in frequency of the allele tested between cases and controls i.e. the probability that the allele is likely to be associated with the trait. 1/
Such is the explanation of "p-value" provided by "the EMBL-EBI Training Programme [which] is celebrating 10 amazing years of providing onsite, offsite and online training in bioinformatics" for "scientists at all levels." 2/ The explanation appears in the answer to the question "What are genome wide association studies (GWAS)?" 3/ It comes from people in the know -- "The home for big data in biology" 4/ at "Europe's flagship laboratory for the life sciences." 5/

Above the description is a Manhattan plot of the p-values for the differences between the frequencies of the single nucleotide alleles comprising a genome-wide SNP array in samples of "cases" (individuals with a trait) and "controls" (individuals without the trait). Sadly, the p-values in the plot cannot be equated to "the probability that the allele is likely to to be associated with the trait."

I. Transposition

To begin with, a p-value is the probability of a difference at least as large as the one observed in the sample of "cases" and the sample of "controls" -- when the probability that a random person has the allele is the same for both cases and controls in the entire population. For a single allele, the expected value of the difference in the sample proportions is zero, but the observed value will vary from one pair of samples to the next. Because most pairs will have small differences, a big difference in a single study is evidence against the hypothesis that there is no difference at the population level. Differences that rarely would occur by chance alone are indicative of a true association. They are not usually false positives. At least, that is the theory behind the p-value.

For example, a p-value of 1/100,000, is a rare occurrence when the two samples come from a population in which the probability of the trait is the same with or without the allele. Consequently, the difference that corresponds to this p-value is thought to be strong evidence that the allele really is more (or less) common among cases than controls.

But even if the reasoning that "we would not expect it if H is true, therefore H is likely to be false" is correct, the p-value of 1/100,000 is not "the probability that the allele is likely to to be associated with the trait." It is the probability of that sort of a discrepancy if the allele has absolutely no association with the trait. In contrast, the probability of "no association" is not even defined in the statistical framework that gives us p-values.

Another way to say it: The p-value is a statement about the evidence (the observed difference) given the hypothesis of no association. It does not represent the probability of the hypothesis of zero association given an observed association. Equating the probability associated with the evidence with the probability associated with the hypothesis is so common that it has a name -- the transposition fallacy.

II. Multiple Comparisons

A second defect of the EMBL-EBI's definition is that a p-value of, say, 1/100,000 is not a good measure of surprise for GWAS data. Suppose there are 500,000 SNPs in the array and none of their alleles has any true association with the trait. If all apparent associations are independent, the expected number with the p-value is 500,000 × (1/100,000) = 5. Because of the many opportunities for individually impressive differences to appear, it is no surprise that some alleles have this small p-value. The p-value would have to be much smaller than 1/100,000 for the apparent association to be as surprising as the reported p-value would suggest. P-values that would produce a reasonable false discovery rate could be very small indeed.

An oversimplified analogy 6/ is this: Flipping a fair coin 18 times and getting 18 heads or tails has a probability on the order of 1/100,000. 7/ Flipping 500,000 coins 18 times each and finding that some of these experiments yielded 18 heads or tails is not strong evidence against the proposition that all the coins are fair. It is just what we would expect to see if the coins are all fair.

Of course, the bioinformaticists at EMBL-EBI are acutely aware of the effect of multiple comparisons. Their catalog of findings only includes "variant-trait associations ... if they have a p-value <1.0 × 10-5 in the overall (initial GWAS + replication) population." 8/ But why insist on so small a number if this p-value is "the probability that the allele is likely to to be associated with the trait"? For multiple comparisons, the p-value of 1/100,000 is not the measure of surprise that it is supposed to be.

  1. EMBL-EBI, What Are Genome Wide Association Studies (GWAS)?,, last visited Sept. 16, 2018.
  2. EMBL-EBI, EMBLI-EBI Training,, last visited Sept. 16, 2018.
  3. EMBL-EBI, supra note 1.
  4. EMBL-EBI,, last visited Sept. 16, 2017.
  5. Id. 
  6. It is oversimplified because not all associations are independent. Nearby SNPs tend to be inherited together, but methods that take account of dependencies and enable researchers to pick out the associations that are real have been studied. 
  7. The more exact probability is 1/131,072.
  8. EMBL-EBI, Where Does the Data Come From?,, last visited Sept. 17, 2018. The phrase "the overall (initial GWAS + replication) population" is puzzling. It sounds like the data from an exploratory study are combined with those from the replication study to give a p-value for a larger sample (not a population). If so, the p-values for each study could be more than 1/100,000.

No comments:

Post a Comment