Statistics Foundations¶

5. Significance¶

What is a hypothesis?¶

In interpreting the results of a scientific study, we are usually wanting to be able to assess the "agreement" between theory and observation, to decide whether the theory stands up to the experimental evidence.

The idea of a hypothesis test is to be able to make a meaningful rejection of that theory, when there is enough evidence to suggest that it is incorrect.

Null hypothesis¶

The null hypothesis in a hypothesis test represents a theoretical scenario that we can model in some way. It is usually given the symbol $H_{0}$.

$H_{0}$: The coin is fair, i.e. the probability of obtaining heads in any trial is 0.5

$H_{0}$: The mean amount of coffee dispensed by the machine is 200g.

$H_{0}$: The survival rates of the treatment group and the placebo group are the same.

We usually think about these models in the form of probability distributions, but sometimes it is easier to work with a simulation. The important thing is to be able to estimate what the results of the experiment would be, if the null hypothesis were actually true.

A simple binomial simulation¶

import numpy as np
rng = np.random.default_rng(0)

# simulate 10 tosses of a fair coin - how many heads?
rng.binomial(n=10,p=0.5)

P-value¶

Imagine that I toss a coin 10 times and get only two heads. Is the coin fair?

Clearly it is not impossible for a fair coin to generate only two heads in ten trials. But it must be quite unlikely.

The p-value is a way to quantify how unexpected a particular outcome is, under the assumption that the null hypothesis is true. This gives us a mechanism for using our observed data to test the hypothesis.

P-value by simulation¶

The null hypothesis is

$H_{0}$: The coin is fair, i.e. the probability of obtaining heads in any trial is 0.5

The experiment is

Toss the coin 10 times and record the number of heads obtained.

I can run my simulated experiment assuming $H_{0}$ as many times as I like.

No description has been provided for this image

The p-value is defined as the probability of obtaining a result as extreme or more extreme than the one observed. We have to decide what that means in each situation.

In our case, a result of 0, 1 or 2 heads would be "as extreme or more extreme" - i.e. the lower tail of the distribution.

I can therefore calculate the empirical p-value from my simulation.

Significance level¶

The p-value gives me a readout of how surprising the observed result is, assuming that $H_{0}$ is true.

A tiny p-value means "very surprising".
A p-value close to 0.5 means "not surprising at all".

To complete the hypothesis test, we need to make a decision about whether the p-value is small enough to lead us to doubt the validity of $H_{0}$.

This decision is based on a threshold value known as $\alpha$, the significance level.

If p < $\alpha$, we say that we reject the null hypothesis as a reasonable explanation of the mechanism that generated the observed data.

The appropriate significance level depends a lot on the kind of science being done.

In a small wet-lab biology experiment, we may have a lot of uncontrollable sources of variability and very few repeats. In general, an $\alpha$ of 0.05 (5%) would be commonly accepted as appropriate.

In a high-precision physics experiment, we may be able to eliminate a huge amount of variability, making much lower values of $\alpha$ appropriate.

Because many possible values of $\alpha$ could be justified, it is essential for the integrity of the hypothesis test to decide this value before making the observation and calculating the p-value.

Before doing any calculations, you should get into the habit of writing down

the null hypothesis.
whether the p-value corresponds to the lower or the upper tail.
the chosen $\alpha$ value.

Type I and Type II errors¶

Notice that there are two ways that the hypothesis test can give the incorrect result:

$H_{0}$ is actually true, but I rejected it. (Type I error)
$H_{0}$ is actually false, but I did not reject it. (Type II error)

We have direct control over the probability of a Type I error - this is simply the $\alpha$ value itself.

e.g. if I set $\alpha = 0.05$ then I accept that if the null hypothesis is true, I will reject it 5% of the time.

1 - $\alpha$ is known as the specificity of the study.

The probability of making a Type II error (known as $\beta$) is in general difficult to estimate accurately, as it may depend on unknown population parameters.

1 - $\beta$ is known as the power of the study.

P-values from probability distributions¶

If the theoretical situation described by the null hypothesis corresponds to a known probability distribution over the outcomes of the experiment, then we can calculate the p-value exactly instead of using a simulation.

Summary: basic procedure for a hypothesis test¶

Formulate the null hypothesis $H_{0}$
Decide the significance level $\alpha$
Make the measurement / observation
Calculate the p-value (by simulation or using a known probability distribution)
If p < $\alpha$, reject $H_{0}$

Next time we will introduce the idea of test statistics and look at some parametric hypothesis tests in more detail.