12  Hypothesis tests

A statistical hypothesis is a claim about the value of a parameter or population characteristic. In any hypothesis-testing problem, there are always two competing hypotheses under consideration

  1. The null hypothesis H0 representing the status quo.
  2. The alternative hypothesis H1 representing the research.

The objective of hypothesis testing is to decide, based on sample information, if the alternative hypotheses is actually supported by the data. One usually do new research to challenge the existing beliefs.

Is there strong evidence for the alternative?

Let’s consider that you want to establish if the null hypothesis H0 is not supported by the data. One usually assume to work under H0, then if the sample does not strongly contradict H0, we will continue to believe in the plausibility of the null hypothesis. There are only two possible conclusions: Reject H0 or Fail to reject H0.

Definition 12.1 The test statistic T(xn) is a function of a sample and is used to make a decision about whether the null hypothesis should be rejected or not. In theory, there are an infinite number of possible tests that could be devised. The choice of a particular test procedure must be based on the probability the test will produce incorrect results. In general, two kind of errors are related with test statistics, i.e. 

  1. A type I error is when the null hypothesis is rejected, but it is true.
  2. A type II error is not rejecting the null when it is false.

The p-value is in general related to the probability of the type I error. So, the smaller the P-value, the more evidence there is in the sample data against the null hypothesis and for the alternative hypothesis.

In general, before performing a test one establish a significance level α (the desired type I error probability), that defines the rejection region. Then the decision rule is: Reject H0p-value αDo not reject H0p-value >α The p-value can be thought of as the smallest significance level at which H0 can be rejected and the calculation of the P-value depends on whether the test is upper, lower, or two-tailed.

For example, let’s consider a sample xn of data. Then, the general procedure for a statistical hypothesis test can be summarized as follows:

  1. an assumption about the distribution of the data, often expressed in terms of a statistical model;
  2. a null hypothesis H0 and an alternative hypothesis H1 which make specific statements about the data;
  3. a test statistic T(xn) which is a function of the data and whose distribution under the null hypothesis is known;
  4. a significance level α which imposes an upper bound on the probability of rejecting H0, given that H0 is true.

Given that T(Xn) under H0 has a known distribution function FT, then the value qα is computed with the quantile function FT1 that is such that FT:R[0,1]FT1:[0,1]R. Mathematically, the p-value is related to the test performed. In general, two kind of tests are available:

A two-tailed test is appropriate if the estimated value is greater or less than a certain range of values, for example, whether a test taker may score above or below a specific range of scores. In this case the p-value represents the probability P(T(Xn)qα/2)=α2,andP(T(Xn)q1α/2)=α2, where qα/2=FT1(α/2) and q1α/2=FT1(1α/2)

A one-tailed test is appropriate if the estimated value may depart from the reference value in only one direction, left or right, but not both. For a left-tailed test the p-value is related to P(T(Xn)qα)=α, while for a right-tailed test to P(T(Xn)q1α)=α. If the distribution function is symmetric, then for a two-tailed test the p-value is related to qα/2=q1α/2 and the formulas simplifies.

12.1 Tests for the means

Proposition 12.1 Let’s consider an IID, normally distributed sample, i.e. Xn=(X1,,Xi,,Xn) with XiN(μ,σ2), and let’s consider the set of hypothesis H0:μ=μ0,H1:μμ0. Then, given the sample mean μ^ () and the corrected sample variance s^ (), under the null hypothesis H0 the test statistic Tn(Xn)=μ^(Xn)μ0s^(Xn)nH0t(n1), is Student-t distributed with n1 degrees of freedom. Moreover as n the distribution of T(Xn) converges () to the distribution of a Standard Normal, i.e. Tn(Xn)H0dN(0,1)limnFTn(t)=Φ(t), where Φ(t) denotes the distribution of a standard normal.

Proof. In the sample is normally distributed, the sample mean is also normally distributed, i.e.  M(Xn)=nμ^(Xn)μ0σN(0,1). Under normality the sample variance, that is a sum of the square of independent and normally distributed random variables, follows a χ2 distribution with n1 degrees of freedom, i.e.  V(Xn)=(n1)s^2(Xn)σ2χ2(n1). Notably, the ratio of a standard normal and a χ2 random variables (each one divided by the respective degrees of freedom) is exactly the definition of a Student-t random variable as in . Hence, the ratio between the statistics M and V divided by their degrees of freedom reads M(Xn)V(Xn)n1=nμ^(Xn)μσσ2s^2(Xn)=nμ^(Xn)μs^2(Xn)tn1. The statistic test under H0 follows a Student-t distribution with n1 degrees of freedom.

Exercise 12.1 Let’s consider a sample of n=500 IID random variables, where each observations is drown from a normal distribution with mean μ=2 and variance σ2=4. Then, evaluate, with an appropriate test, if the following three sets of hypothesis are statistically significant with confidence level α=10%, i.e. (1)H0:μ=2.3,H1:μ2.3,(2)H0:μ=2.2,H1:μ2.2,(3)H0:μ=2.1,H1:μ2.1.

Solution 12.1. Since we are dealing with a unique Normal population we can apply a t-test (). More precisely, the statistic will be Student-t distributed and the critical value with probability α is such that: qα/2=FTn1(α/2), where FTn1 is the quantile function of a Student-t. Therefore, if the statistic computed on a sample xn lies outside the rejection area, i.e.  H0 is not rejected|qα/2|<Tn(xn)<|qα/2|, then we reject H0 with probability α and we can conclude that the mean is significantly different from μ0.

Solution
library(dplyr)
set.seed(1) # random seed 
# ============================================
#                   Inputs
# ============================================                 
# Dimension of the sample 
n <- 500 # number of simulations 
# Means for the tests
mu_0 <- c(2.3, 2.2, 2.1) 
# true mean
mu <- 2 
# true variance
sigma2 <- 4
alpha <- 0.1 # confidence level
# ============================================
# Simulated random variable 
x <- rnorm(n, mean = mu, sd = sqrt(sigma2))
# Sample mean
mu_hat <- mean(x)
# Corrected sample variance
s2_hat <- (mean(x^2) - mu_hat^2) * n / (n - 1)
# Statistic T (1)
T_1 <- sqrt(n) * (mu_hat - mu_0[1]) / sqrt(s2_hat)
# Statistic T (2)
T_2 <- sqrt(n) * (mu_hat - mu_0[2]) / sqrt(s2_hat)
# Statistic T (3)
T_3 <- sqrt(n) * (mu_hat - mu_0[3]) / sqrt(s2_hat)
# Degrees of freedom 
nu <- n - 1
# Critical value
q_alpha_2 <- abs(qt(alpha / 2, df = nu))
μ0 α qα/2 Tn(xn) q1α/2 H0
2.3 0.1 -1.648 -2.814 1.648 Rejected
2.2 0.1 -1.648 -1.709 1.648 Rejected
2.1 0.1 -1.648 -0.604 1.648 Not-rejected
Table 12.1: t-tests on the mean of a Normal sample.
Figure 12.1: t-tests on the mean of a Normal sample.

12.1.1 Test for two means and equal variances

Proposition 12.2 Let’s consider two IID Gaussian samples with unknown means μ1 and μ2 and unknown equal variances σ12=σ22=σ2, i.e.  Xn1N(μ1,σ2),Xn2N(μ2,σ2), where n1 and n2 are the number of observations in each sample and let’s consider the set of hypothesis H0:μ1μ2=μΔ,H1:μ1μ2μΔ. Then, given the sample mean μ^(xn) () under the null hypothesis H0 the test statistic is Student-t distributed with n1+n22 degrees of freedom, i.e. (12.1)T(Xn1,Xn2)=μ^(Xn1)μ^(Xn2)μΔs^2(Xn1,Xn2)H0t(n1+n22), where (12.2)s^2(Xn1,Xn2)=(n11)s^2(Xn1)+(n21)s^2(Xn2)n1+n22(1n1+1n2), and s^2 is the sample corrected variance () computed on the two samples.

Exercise 12.2 Let’s consider two samples extracted from a Normal distribution with n1=100 and n2=200 and X100N(2,4),X200N(1,4), Then, let’s evaluate with an appropriate test and confidence level α=10% the following sets of hypothesis, i.e. (1)H0:μΔ=μ1μ2=0.5,H1:μΔ=μ1μ20.5,(2)H0:μΔ=μ1μ2=0.75,H1:μΔ=μ1μ20.75,(3)H0:μΔ=μ1μ2=1,H1:μΔ=μ1μ21,

Solution 12.2. Since the sample are normally distributed in populations with equal variances, we can consider the test statistic in . More precisely, the statistic test is Student-t distributed T(x100,x200)=μ^(x100)μ^(x200)μΔs^2(x100,x200)H0t298, where s^2(x100,x200) is computed as in . Since it is a two-tailed test the critical value for a significance level α is defined as the value of the statistic
qα/2=FT1(α/2), where FT1 is the quantile function of a Student-t, that is symmetric. Therefore, if T lies outside the rejection area, i.e.  H0 is not rejected|qα/2|<T(xn)<|qα/2|, then H0 is rejected with probability α and one can conclude that the difference between the sample means is significantly different from μΔ. Otherwise when H0 is not rejected one can conclude that the difference between the sample means is not statistically different from μΔ.

Solution
set.seed(1)
# ============================================
#                   Inputs
# ============================================ 
n1 <- 100
n2 <- 200
# Confidence level 
alpha <- 0.10
# True means 
mu <- c(X_n1 = 2, X_n2 = 1)
# True variances 
sigma2 <- 4
# Tests
mu_delta <- c(0.5, 0.75, 1)
# ============================================
# Simulated populations
X_n1 <- rnorm(n1, mu[1], sqrt(sigma2))
X_n2 <- rnorm(n2, mu[2], sqrt(sigma2))
# Sample means
mu_X_n1 <- mean(X_n1)
mu_X_n2 <- mean(X_n2)
# Corrected sample variances
s2_X_n1 <- (mean(X_n1^2) - mu_X_n1^2) * n1 / (n1 - 1)
s2_X_n2 <- (mean(X_n2^2) - mu_X_n2^2) * n2 / (n2 - 1)
# Merged variance
s2_n1_n2 <- ((n1 - 1) * s2_X_n1 + (n2 - 1) * s2_X_n2)/(n1+n2-2) * (1/n1 + 1/n2)
# Degrees of freedom 
nu <- n1 + n2 - 2
# Critical value
q_alpha_2 <- abs(qt(alpha / 2, df = nu))
# Test 1 
T_1 <- (mu_X_n1 - mu_X_n2 - mu_delta[1]) / sqrt(s2_n1_n2)
# Test 2
T_2 <- (mu_X_n1 - mu_X_n2 - mu_delta[2]) / sqrt(s2_n1_n2)
# Test 3
T_3 <- (mu_X_n1 - mu_X_n2 - mu_delta[3]) / sqrt(s2_n1_n2)
α μΔ qα/2 Tn(xn) q1α/2 H0
0.1 0.50 -1.65 3.075 1.65 Rejected
0.1 0.75 -1.65 2.016 1.65 Rejected
0.1 1.00 -1.65 0.957 1.65 Not-rejected
Table 12.2: Tests on the difference between the means of two Normal populations with equal variances.
Figure 12.2: Tests on the difference between the means of two Normal populations with equal variances.

12.1.2 Test for two means and unequal variances

Proposition 12.3 Let’s consider two IID Gaussian samples with unknown means μ1 and μ2 and unknown variances σ12 and σ22, i.e.  Xn1N(μ1,σ12),Xn2N(μ2,σ22), where n1 and n2 are the number of observations in each sample and let’s consider the set of hypothesis H0:μ1μ2=μΔ,H1:μ1μ2μΔ. Then, given the sample means μ^ () and corrected sample variances (), Welch () - Welch () proposes a test statistic that under the null hypothesis H0 is approximately Student-t distributed with ν degrees of freedom, i.e. T(Xn1,Xn2)=μ^(Xn1)μ^(Xn2)μΔs^2(Xn1)n1+s^2(Xn2)n2H0tν, where the degrees of freedom ν is not necessary an integer and it is computed using the Welch–Satterthwaite approximation. More precisely, it is defined as weighted average of the degrees of freedom of each group, reflecting the uncertainty due to unequal variances, i.e.  ν=(s^2(Xn1)n1+s^2(Xn1)n2)2(s^2(Xn1))2n12(n11)+(s^2(Xn1))2n22(n21).

Exercise 12.3 Let’s consider two samples extracted from a Normal distribution with n1=100 and n2=200 and X100N(2,4),X200N(1,9), Then, let’s evaluate with an appropriate test and confidence level α=10% the following sets of hypothesis in .

Solution 12.3.

Solution
set.seed(1)
# ============================
#           Inputs
# ============================
n1 <- 100
n2 <- 200
# Confidence level 
alpha <- 0.10
# True means 
mu <- c(X_n1 = 2, X_n2 = 1)
# True variances 
sigma2 <- c(X_n1 = 4, X_n2 = 9)
# Tests
mu_delta <- c(0.5, 0.75, 1)
# ============================
# Simulated populations
X_n1 <- rnorm(n1, mu[1], sqrt(sigma2[1]))
X_n2 <- rnorm(n2, mu[2], sqrt(sigma2[2]))
# Sample means
mu_X_n1 <- mean(X_n1)
mu_X_n2 <- mean(X_n2)
# Corrected sample variances
s2_X_n1 <- (mean(X_n1^2) - mu_X_n1^2) * n1 / (n1 - 1)
s2_X_n2 <- (mean(X_n2^2) - mu_X_n2^2) * n2 / (n2 - 1)
# Merged variance
s2_n1_n2 <- s2_X_n1 / n1 + s2_X_n2 / n2
# Degrees of freedom 
nu <- (s2_X_n1 / n1 + s2_X_n2 / n2)^2 / (s2_X_n1^2 / (n1^2 * (n1-1)) + s2_X_n2^2 / (n2^2 * (n2-1)))
# Critical value
q_alpha_2 <- abs(qt(alpha / 2, df = nu))
# Test 1 
T_1 <- (mu_X_n1 - mu_X_n2 - mu_delta[1]) / sqrt(s2_n1_n2)
# Test 2
T_2 <- (mu_X_n1 - mu_X_n2 - mu_delta[2]) / sqrt(s2_n1_n2)
# Test 3
T_3 <- (mu_X_n1 - mu_X_n2 - mu_delta[3]) / sqrt(s2_n1_n2)
α μΔ qα/2 Tn(xn) q1α/2 H0
0.1 0.50 -1.65 2.634 1.65 Rejected
0.1 0.75 -1.65 1.732 1.65 Rejected
0.1 1.00 -1.65 0.830 1.65 Not-rejected
Table 12.3: Tests on the difference between the means of two Normal populations with unequal variances.
Figure 12.3: Tests on the difference between the means of two Normal populations with unequal variances.

12.2 Tests for the variances

12.2.1 F-test for two variances

Proposition 12.4 Let’s consider two IID Gaussian samples with unknown means μ1 and μ2 and unknown variances σ12 and σ22, i.e.  Xn1N(μ1,σ12),Xn2N(μ2,σ22), where n1 and n2 are the number of observations in each sample and let’s consider the set of hypothesis H0:σ12=σ22=σ2,H1:σ12σ22. Then, the corrected sample variances (), the following test statistic that under the null hypothesis H0 has F-Fischer distribution () with n11 and n22 degrees of freedom, i.e. T(Xn1,Xn2)H0s^2(Xn1)s^2(Xn2)F(n11,n2,1). This means that the null hypothesis of equal variances can be rejected when the statistic is as extreme or more extreme than the critical value qα obtained from the F-distribution with degrees of freedom n11 and n21 and confidence level α, i.e.  H0 is not rejectedqα<T(xn1,xn2)q1α.

Proof. Using the fact that the sample variance of a Normal IID population is χ2-distributed () let’s define the statistics
T1(Xn1)=(n11)s^2(Xn1)σ12χ2(n11),T2(Xn2)=(n21)s^2(Xn2)σ22χ2(n21). where s^ reads as in . Hence, the statistic given by their ratios reads: T(Xn1,Xn2)=T1(Xn1)n11T2(Xn2)n21=s^2(Xn1)σ12s^2(Xn2)σ22=s^2(Xn1)σ22s^2(Xn2)σ12. Thus, using the fact that the ratio of two independent χ2 random variables divided by their respective degrees of freedom follows an F-distribution () with n11 and n22 degrees of freedom. and that under H0:σ12=σ22=σ2 one obtain
T(Xn1,Xn2)=s^2(Xn1)s^2(Xn2)H0F(n11,n2,1).

Exercise 12.4 Let’s consider two samples extracted from a Normal distribution with n1=100 and n2=200 and X100N(2,4.4),X200N(1,4), Then, let’s evaluate with an appropriate test and confidence level α=10% the following sets of hypothesis, i.e. (1)H0:σ1=σ2,H1:σ1σ2.

Solution 12.4.

Solution
set.seed(1)
# ============================
#           Inputs
# ============================
n1 <- 100
n2 <- 300
# Confidence level 
alpha <- 0.10
# True means 
mu <- c(X_n1 = 0, X_n2 = 0)
# True variances 
sigma2 <- c(X_n1 = 4, X_n2 = 9)
# ============================
# Simulated populations
X_n1 <- rnorm(n1, mu[1], sqrt(sigma2[1]))
X_n2 <- rnorm(n2, mu[2], sqrt(sigma2[2]))
# Sample means
mu_X_n1 <- mean(X_n1)
mu_X_n2 <- mean(X_n2)
# Corrected sample variances
s2_X_n1 <- (mean(X_n1^2) - mu_X_n1^2) * n1 / (n1 - 1)
s2_X_n2 <- (mean(X_n2^2) - mu_X_n2^2) * n2 / (n2 - 1)
# Degrees of freedom 
nu_1 <- n1 - 1
nu_2 <- n2 - 1
# Critical values
q_alpha_2 <- c(qf(alpha, nu_1, nu_2), qf(1-alpha, nu_1, nu_2))
# Test
T_ <- s2_X_n1 / s2_X_n2
α qα T(xn1,xn2) q1α H0
0.1 0.803 0.364 1.225 Rejected
Table 12.4: Tests on the difference between the variances of two Normal populations.
Figure 12.4: Tests on the difference between the variances of two Normal populations.

12.3 Left and right tailed tests

Let’s consider the three kind of tests that can be performed: two-tailed, left-tailed and right-tailed. Starting with the first one, in general it is used to evaluate if an estimation is equal or not to a certain value.

For example, considering an observed sample xn, drown from an IID population and let’s say we would like to investigate if it has a certain mean μ0=E{X1} in population. In practice, in this setting we are considering the following sets of hypothesis, i.e. H0:μ=μ0,H1:μμ0. where in general H0 is what the researcher expect to be true, while H1 is the complementary alternative. Given a statistic T, that has a known distribution FT under H0, the critical value qα/2 for a significance level α is defined α=P([T(Xn)<qα/2][T(Xn)>q1α/2])qα/2=FT1(α/2),q1α/2=FT1(1α/2) where FT1 is the quantile function. Therefore, if the statistic test lies in the rejection area, i.e.  Tn(Xn)<qα/2andTn(Xn)>q1α/2. then we reject H0 with probability α and we can conclude that the mean is significantly different from μ0,

Let’s now instead consider one-tailed test, appropriate if the estimated value may depart from the reference value only in one direction, left or right, but not both. Let’s consider the set of hypothesis, H0:μ=μ0,H1:μ<μ0. In this case, we need a left-tailed test. The statistic test T can be left the same as in the two tailed test, but the critical values now must be re-computed. In fact, in this case we will search for a qα such that P(Tqα)=α. Applying the quantile function we obtain: α=P(Tn(Xn)<qα)qα=FT1(α), Therefore, if the statistic test lies in the rejection area, i.e.  T(Xn)<qα. then we reject H0 with probability α and we can conclude that the mean is significantly lower than μ0,

Exercise 12.5 Continuing from , evaluate with a left-tailed test with confidence level α=10%, the following sets of hypothesis, i.e. (1)H0:μ=2.3,H1:μ<2.3,(2)H0:μ=2.2,H1:μ<2.2,(3)H0:μ=2.1,H1:μ<2.1.

Solution 12.5. In this case, we need a left-tailed test. The statistic test T can be left the same as in , but the critical values now must be re-computed. In fact, in this case we will search for a qα such that P(Tqα)=α. Applying the quantile function we obtain: qα=FT1(α). In this case, with α=0.10, the critical value of a Student-t with 499 degrees of freedom is qα=1.28325. Therefore, if T<1.28325 we do not reject the null hypothesis, i.e.  H0 is not rejectedTn(xn)qα.

Solution
library(dplyr)
set.seed(1) # random seed 
# ================== Setups ==================
# Dimension of the sample 
n <- 500 # number of simulations 
# Means for the tests
mu_0 <- c(2.3, 2.2, 2.1) 
# true mean
mu <- 2 
# true variance
sigma2 <- 4
alpha <- 0.1 # confidence level
# ============================================
# Simulated random variable 
x <- rnorm(n, mean = mu, sd = sqrt(sigma2))
# Sample mean
mu_hat <- mean(x)
# Corrected sample variance
s2_hat <- (mean(x^2) - mu_hat^2) * n / (n - 1)
# Statistic T (1)
T_1 <- sqrt(n) * (mu_hat - mu_0[1]) / sqrt(s2_hat)
# Statistic T (2)
T_2 <- sqrt(n) * (mu_hat - mu_0[2]) / sqrt(s2_hat)
# Statistic T (3)
T_3 <- sqrt(n) * (mu_hat - mu_0[3]) / sqrt(s2_hat)
# Critical value
q_alpha <- qt(alpha, df = n - 1)
α μ0 qα Tn(xn) H0
0.1 2.3 -1.283 -2.814 Rejected
0.1 2.2 -1.283 -1.709 Rejected
0.1 2.1 -1.283 -0.604 Not-rejected
Table 12.5: Left-tailed t-tests on the mean of a Normal sample.
Figure 12.5: Left-tailed test on the mean.

Lastly, let’s consider the right-tailed test for the set of hypothesis H0:μ=μ0,H1:μ>μ0. It is always one-side test, but in this case the critical value qα is defined as α=1P(T(Xn)qα)q1α=FT1(1α). Therefore, if the statistic test lies outside the rejection area, i.e.  H0 is not rejectedT(xn)q1α.

Exercise 12.6 Continuing from , evaluate with a left-tailed test with confidence level α=10%, the following sets of hypothesis, i.e. (1)H0:μ=2.3,H1:μ>2.3,(2)H0:μ=2.2,H1:μ>2.2,(3)H0:μ=2.1,H1:μ>2.1.

Solution 12.6. It is always one-side test, but in this case the critical value q1α is defined as q1α=FTn1(1α), In this case, with α=10%, the critical value of a Student-t with 499 degrees of freedom is q1α=1.28325.

Therefore, if T1.28325 we do not reject the null hypothesis, i.e. the sample is greater than μ0, otherwise we reject it and the sample mean is greater than μ0. Coherently with the previous test performed in a right railed test is never rejected.

Solution
library(dplyr)
set.seed(1) # random seed 
# ================== Setups ==================
# Dimension of the sample 
n <- 500 # number of simulations 
# Means for the tests
mu_0 <- c(2.3, 2.2, 2.1) 
# true mean
mu <- 2 
# true variance
sigma2 <- 4
alpha <- 0.1 # confidence level
# ============================================
# Simulated random variable 
x <- rnorm(n, mean = mu, sd = sqrt(sigma2))
# Sample mean
mu_hat <- mean(x)
# Corrected sample variance
s2_hat <- (mean(x^2) - mu_hat^2) * n / (n - 1)
# Statistic T (1)
T_1 <- sqrt(n) * (mu_hat - mu_0[1]) / sqrt(s2_hat)
# Statistic T (2)
T_2 <- sqrt(n) * (mu_hat - mu_0[2]) / sqrt(s2_hat)
# Statistic T (3)
T_3 <- sqrt(n) * (mu_hat - mu_0[3]) / sqrt(s2_hat)
# Critical value
q_alpha <- qt(1-alpha, df = n - 1)
α μ0 Tn(xn) q1α H0
0.1 2.3 -2.814 1.283 Not-rejected
0.1 2.2 -1.709 1.283 Not-rejected
0.1 2.1 -0.604 1.283 Not-rejected
Table 12.6: Left-tailed t-tests on the mean of a Normal sample.
Figure 12.6: Right-tailed test on the mean.