Monday, September 28, 2020

Terminology Department: Significance

Inns of Court College of Advocacy, Guidance on the Preparation, Admission and Examination of Expert Evidence § 5.2 (3d ed. 2020)
Statisticians, for example, use what appear to be everyday words in specific technical senses. 'Significance' is an example. In everyday language it carries associations of importance, something with considerable meaning. In statistics it is a measure of the likelihood that a relationship between two or more variables is caused by something other than random chance.
Welcome to the ICCA

The Inns of Court College of Advocacy ... is the educational arm of the Council of the Inns of Court. The ICCA strives for ‘Academic and Professional Excellence for the Bar’. Led by the Dean, the ICCA has a team of highly experienced legal academics, educators and instructional designers. It also draws on the expertise of the profession across the Inns, Circuits, Specialist Bar Associations and the Judiciary to design and deliver bespoke training for student barristers and practitioners at all levels of seniority, both nationally, pan-profession and on an international scale.

How good is the barristers' definition of statistical significance? In statistics, an apparent association between variables is said to be significant when it is lies outside the range that one would expect to see in some large fraction of repeated, identically conducted studies in which the variables are in fact uncorrelated. Sir Ronald Fisher articulated the idea as follows:

[I]t is convenient to draw the line at about the level at which we can say: ‘Either there is something in the treatment, or a coincidence has occurred such as does not occur more than once in twenty trials.’ This level ... we may call the 5 per cent. point .... If one in twenty does not seem high enough odds, we may, if we prefer it, draw the line at one in fifty (the 2 per cent. point), or one in a hundred (the 1 per cent point). Personally, the writer prefers to set a low standard of significance at the 5 per cent point, and ignore entirely all results which fail to reach that level. A scientific fact should be regarded as experimentally established only if a properly designed experiment rarely fails to give this level of significance. [1]

For Fisher, a "significant" result would occur by sheer coincidence no "more than once in twenty trials" (on average). 

Is such statistical significance the same as the barristers' "likelihood" that an observed "relationship ... is caused by something other than random chance"? One might object to the appearance of the term "likelihood" in the definition because it too is a technical term with a specialized meaning in statistics, but that is not the main problem. The venacular likelihood that X is the cause of extreme data (where X is anything other than random chance) is not a "level of significance" such as 5%, 2%, or 1%. These levels are conditional error probabilities: If the variables are uncorrelated and we use a given level to call the observed results significant, then, in the (very) long run, we will label coincidental results as significant no more than that level specifies. For example, if we always use a 0.01 level, we will call coincidences "significant" no more than 1% of the time (in the limit).

The probability (the venacular likelihood) "that a relationship between two or more variables is caused by something other than random chance" is quite different. [2, p.53] Everything else being equal, significant results are more likely to signal a true relationship than are nonsignificant results, but the significance level itself refers to the probability of data that are uncommon when there is no true relationship, and not to the probability that the apparent relationship is real. In symbols, Pr(relationship | extreme data) is not Pr(extreme data | relationship). Naively swapping the terms in the expressions for the conditional probabilities is known as the transposition fallacy. In regard to criminal cases involving statistical evidence, it often is called the "prosecutor's fallacy." Perhaps "barristers' fallacy" can be added to the list.

REFERENCES

  1. Ronald Fisher, The Arrangement of Field Experiments, 33 J. Ministry Agric. Gr. Brit 503-515, 504 (1926).
  2. David H. Kaye, Frequentist Methods for Statistical Inference, in Handbook of Forensic Statistics 39-72 (David Banks et al. eds. 2021).

ACKNOWLEDGMENT: Thanks to Geoff Morrison for alerting me to the ICCA definition.