Why many statisticians reject “null hypothesis”/”p-value” testing


You need to know about H0/H1 testing, if only because it is so widespread (and anyway it is the basis of the Edexcel exam). Not for the Edexcel exam, but for your own education, you should also have some awareness that many statisticians consider the whole approach via “null hypothesis” and “p value” unsound.

One: the null hypothesis H0 is by definition always untrue. A coin never gives exactly 50% heads, a dice never gives exactly one six per six rolls, etc. Presumably what you mean by “there is evidence to reject the null hypothesis” is “there is evidence that the null hypothesis is seriously out”. But “seriously” is vague, isn’t it?

Two: the p-value measures the size of the sample as much as the size of any deviation from the null hypothesis. If you have a coin for which probability (heads) = 0.501, and so the coin is pretty near unbiased, still, with a big enough sample you are likely to get a “statistically significant” deviation. (This objection is dealt with in some refinements of H0/H1 by reporting a measure of “effect size”).

Three: in fact, if you keep on taking bigger and bigger samples, you would be certain to get a “significant” deviation eventually even if the coin were exactly unbiased.

Four: the cut-offs of p=0.05 and p=0.01 have no real justification. They are just figures plucked from the air.

Five: the logic is the wrong way round. The fact that a sample result is very unlikely if the null hypothesis is true does not necessarily imply that the null hypothesis is very unlikely if the sample result is true. For example, there may be something “very unlikely” about the sample which has escaped our notice. (If someone is a Year 13 Further Maths student in England in 2017, then the probability that she or he knows the geometric method for Möbius transformations is about 0.001 – eight students who know out of about 7,000. It doesn’t follow that if someone knows the geometric method, then it is very unlikely that she or he is a Year 13 Further Maths student).

Summary by Robert Coe

Summary by Jacob Cohen

And here’s a classic discussion by John Tukey, the guy who invented the box-plot and coined the words “bit” and “software”

Here’s a practical example. In marketing, because marketing people don’t know much statistics, tests of whether one approach is better than another usually give “false positives”: see here.