A statistical hypothesis is a claim about the value of a parameter or population characteristic. In any hypothesis-testing problem, there are always two competing hypotheses under consideration
The null hypothesis representing the status quo.
The alternative hypothesis representing the research.
The objective of hypothesis testing is to decide, based on sample information, if the alternative hypotheses is actually supported by the data. One usually do new research to challenge the existing beliefs.
Is there strong evidence for the alternative?
Let’s consider that you want to establish if the null hypothesis is not supported by the data. One usually assume to work under , then if the sample does not strongly contradict H0, we will continue to believe in the plausibility of the null hypothesis. There are only two possible conclusions: Reject or Fail to reject .
Definition 12.1 The test statistic is a function of a sample and is used to make a decision about whether the null hypothesis should be rejected or not. In theory, there are an infinite number of possible tests that could be devised. The choice of a particular test procedure must be based on the probability the test will produce incorrect results. In general, two kind of errors are related with test statistics, i.e.
A type I error is when the null hypothesis is rejected, but it is true.
A type II error is not rejecting the null when it is false.
The p-value is in general related to the probability of the type I error. So, the smaller the P-value, the more evidence there is in the sample data against the null hypothesis and for the alternative hypothesis.
In general, before performing a test one establish a significance level (the desired type I error probability), that defines the rejection region. Then the decision rule is: The p-value can be thought of as the smallest significance level at which can be rejected and the calculation of the P-value depends on whether the test is upper, lower, or two-tailed.
For example, let’s consider a sample of data. Then, the general procedure for a statistical hypothesis test can be summarized as follows:
an assumption about the distribution of the data, often expressed in terms of a statistical model;
a null hypothesis and an alternative hypothesis which make specific statements about the data;
a test statistic which is a function of the data and whose distribution under the null hypothesis is known;
a significance level which imposes an upper bound on the probability of rejecting , given that is true.
Given that under has a known distribution function , then the value is computed with the quantile function that is such that Mathematically, the p-value is related to the test performed. In general, two kind of tests are available:
A two-tailed test is appropriate if the estimated value is greater or less than a certain range of values, for example, whether a test taker may score above or below a specific range of scores. In this case the p-value represents the probability where and
A one-tailed test is appropriate if the estimated value may depart from the reference value in only one direction, left or right, but not both. For a left-tailed test the p-value is related to while for a right-tailed test to If the distribution function is symmetric, then for a two-tailed test the p-value is related to and the formulas simplifies.
12.1 Tests for the means
Proposition 12.1 Let’s consider an IID, normally distributed sample, i.e. with , and let’s consider the set of hypothesis Then, given the sample mean (Equation 10.1) and the corrected sample variance (Equation 10.6), under the null hypothesis the test statistic is Student-t distributed with degrees of freedom. Moreover as the distribution of converges (Definition 8.5) to the distribution of a Standard Normal, i.e. where denotes the distribution of a standard normal.
Proof. In the sample is normally distributed, the sample mean is also normally distributed, i.e. Under normality the sample variance, that is a sum of the square of independent and normally distributed random variables, follows a distribution with degrees of freedom, i.e. Notably, the ratio of a standard normal and a random variables (each one divided by the respective degrees of freedom) is exactly the definition of a Student-t random variable as in Equation 32.2. Hence, the ratio between the statistics and divided by their degrees of freedom reads The statistic test under follows a Student-t distribution with degrees of freedom.
Exercise 12.1 Let’s consider a sample of IID random variables, where each observations is drown from a normal distribution with mean and variance . Then, evaluate, with an appropriate test, if the following three sets of hypothesis are statistically significant with confidence level , i.e.
Solution 12.1. Since we are dealing with a unique Normal population we can apply a t-test (Proposition 12.1). More precisely, the statistic will be Student-t distributed and the critical value with probability is such that: where is the quantile function of a Student-t. Therefore, if the statistic computed on a sample lies outside the rejection area, i.e. then we reject with probability and we can conclude that the mean is significantly different from .
Solution
library(dplyr)set.seed(1) # random seed # ============================================# Inputs# ============================================ # Dimension of the sample n <-500# number of simulations # Means for the testsmu_0 <-c(2.3, 2.2, 2.1) # true meanmu <-2# true variancesigma2 <-4alpha <-0.1# confidence level# ============================================# Simulated random variable x <-rnorm(n, mean = mu, sd =sqrt(sigma2))# Sample meanmu_hat <-mean(x)# Corrected sample variances2_hat <- (mean(x^2) - mu_hat^2) * n / (n -1)# Statistic T (1)T_1 <-sqrt(n) * (mu_hat - mu_0[1]) /sqrt(s2_hat)# Statistic T (2)T_2 <-sqrt(n) * (mu_hat - mu_0[2]) /sqrt(s2_hat)# Statistic T (3)T_3 <-sqrt(n) * (mu_hat - mu_0[3]) /sqrt(s2_hat)# Degrees of freedom nu <- n -1# Critical valueq_alpha_2 <-abs(qt(alpha /2, df = nu))
2.3
0.1
-1.648
-2.814
1.648
Rejected
2.2
0.1
-1.648
-1.709
1.648
Rejected
2.1
0.1
-1.648
-0.604
1.648
Not-rejected
Table 12.1: t-tests on the mean of a Normal sample.
Figure 12.1: t-tests on the mean of a Normal sample.
12.1.1 Test for two means and equal variances
Proposition 12.2 Let’s consider two IID Gaussian samples with unknown means and and unknown equal variances , i.e. where and are the number of observations in each sample and let’s consider the set of hypothesis Then, given the sample mean (Equation 10.1) under the null hypothesis the test statistic is Student-t distributed with degrees of freedom, i.e. where and is the sample corrected variance (Equation 10.6) computed on the two samples.
Exercise 12.2 Let’s consider two samples extracted from a Normal distribution with and and Then, let’s evaluate with an appropriate test and confidence level the following sets of hypothesis, i.e.
Solution 12.2. Since the sample are normally distributed in populations with equal variances, we can consider the test statistic in Equation 12.1. More precisely, the statistic test is Student-t distributed where is computed as in Equation 12.2. Since it is a two-tailed test the critical value for a significance level is defined as the value of the statistic where is the quantile function of a Student-t, that is symmetric. Therefore, if lies outside the rejection area, i.e. then is rejected with probability and one can conclude that the difference between the sample means is significantly different from . Otherwise when is not rejected one can conclude that the difference between the sample means is not statistically different from .
Table 12.2: Tests on the difference between the means of two Normal populations with equal variances.
Figure 12.2: Tests on the difference between the means of two Normal populations with equal variances.
12.1.2 Test for two means and unequal variances
Proposition 12.3 Let’s consider two IID Gaussian samples with unknown means and and unknown variances and , i.e. where and are the number of observations in each sample and let’s consider the set of hypothesis Then, given the sample means (Equation 10.1) and corrected sample variances (Equation 10.6), Welch (1938) - Welch (1947) proposes a test statistic that under the null hypothesis is approximately Student-t distributed with degrees of freedom, i.e. where the degrees of freedom is not necessary an integer and it is computed using the Welch–Satterthwaite approximation. More precisely, it is defined as weighted average of the degrees of freedom of each group, reflecting the uncertainty due to unequal variances, i.e.
Exercise 12.3 Let’s consider two samples extracted from a Normal distribution with and and Then, let’s evaluate with an appropriate test and confidence level the following sets of hypothesis in Exercise 12.2.
Table 12.3: Tests on the difference between the means of two Normal populations with unequal variances.
Figure 12.3: Tests on the difference between the means of two Normal populations with unequal variances.
12.2 Tests for the variances
12.2.1 F-test for two variances
Proposition 12.4 Let’s consider two IID Gaussian samples with unknown means and and unknown variances and , i.e. where and are the number of observations in each sample and let’s consider the set of hypothesis Then, the corrected sample variances (Equation 10.6), the following test statistic that under the null hypothesis has F-Fischer distribution (Equation 32.3) with and degrees of freedom, i.e. This means that the null hypothesis of equal variances can be rejected when the statistic is as extreme or more extreme than the critical value obtained from the -distribution with degrees of freedom and and confidence level , i.e.
Proof. Using the fact that the sample variance of a Normal IID population is -distributed (Equation 10.10) let’s define the statistics where reads as in Equation 10.6. Hence, the statistic given by their ratios reads: Thus, using the fact that the ratio of two independent random variables divided by their respective degrees of freedom follows an -distribution (Equation 32.3) with and degrees of freedom. and that under one obtain
Exercise 12.4 Let’s consider two samples extracted from a Normal distribution with and and Then, let’s evaluate with an appropriate test and confidence level the following sets of hypothesis, i.e.
Table 12.4: Tests on the difference between the variances of two Normal populations.
Figure 12.4: Tests on the difference between the variances of two Normal populations.
12.3 Left and right tailed tests
Let’s consider the three kind of tests that can be performed: two-tailed, left-tailed and right-tailed. Starting with the first one, in general it is used to evaluate if an estimation is equal or not to a certain value.
For example, considering an observed sample , drown from an IID population and let’s say we would like to investigate if it has a certain mean in population. In practice, in this setting we are considering the following sets of hypothesis, i.e. where in general is what the researcher expect to be true, while is the complementary alternative. Given a statistic , that has a known distribution under , the critical value for a significance level is defined where is the quantile function. Therefore, if the statistic test lies in the rejection area, i.e. then we reject with probability and we can conclude that the mean is significantly different from ,
Let’s now instead consider one-tailed test, appropriate if the estimated value may depart from the reference value only in one direction, left or right, but not both. Let’s consider the set of hypothesis, In this case, we need a left-tailed test. The statistic test can be left the same as in the two tailed test, but the critical values now must be re-computed. In fact, in this case we will search for a such that . Applying the quantile function we obtain: Therefore, if the statistic test lies in the rejection area, i.e. then we reject with probability and we can conclude that the mean is significantly lower than ,
Exercise 12.5 Continuing from Exercise 12.1, evaluate with a left-tailed test with confidence level , the following sets of hypothesis, i.e.
Solution 12.5. In this case, we need a left-tailed test. The statistic test can be left the same as in Exercise 12.1, but the critical values now must be re-computed. In fact, in this case we will search for a such that . Applying the quantile function we obtain: In this case, with , the critical value of a Student-t with 499 degrees of freedom is . Therefore, if we do not reject the null hypothesis, i.e.
Solution
library(dplyr)set.seed(1) # random seed # ================== Setups ==================# Dimension of the sample n <-500# number of simulations # Means for the testsmu_0 <-c(2.3, 2.2, 2.1) # true meanmu <-2# true variancesigma2 <-4alpha <-0.1# confidence level# ============================================# Simulated random variable x <-rnorm(n, mean = mu, sd =sqrt(sigma2))# Sample meanmu_hat <-mean(x)# Corrected sample variances2_hat <- (mean(x^2) - mu_hat^2) * n / (n -1)# Statistic T (1)T_1 <-sqrt(n) * (mu_hat - mu_0[1]) /sqrt(s2_hat)# Statistic T (2)T_2 <-sqrt(n) * (mu_hat - mu_0[2]) /sqrt(s2_hat)# Statistic T (3)T_3 <-sqrt(n) * (mu_hat - mu_0[3]) /sqrt(s2_hat)# Critical valueq_alpha <-qt(alpha, df = n -1)
0.1
2.3
-1.283
-2.814
Rejected
0.1
2.2
-1.283
-1.709
Rejected
0.1
2.1
-1.283
-0.604
Not-rejected
Table 12.5: Left-tailed t-tests on the mean of a Normal sample.
Figure 12.5: Left-tailed test on the mean.
Lastly, let’s consider the right-tailed test for the set of hypothesis It is always one-side test, but in this case the critical value is defined as Therefore, if the statistic test lies outside the rejection area, i.e.
Exercise 12.6 Continuing from Exercise 12.1, evaluate with a left-tailed test with confidence level , the following sets of hypothesis, i.e.
Solution 12.6. It is always one-side test, but in this case the critical value is defined as In this case, with , the critical value of a Student-t with 499 degrees of freedom is .
Therefore, if we do not reject the null hypothesis, i.e. the sample is greater than , otherwise we reject it and the sample mean is greater than . Coherently with the previous test performed in Figure 12.5 a right railed test is never rejected.
Solution
library(dplyr)set.seed(1) # random seed # ================== Setups ==================# Dimension of the sample n <-500# number of simulations # Means for the testsmu_0 <-c(2.3, 2.2, 2.1) # true meanmu <-2# true variancesigma2 <-4alpha <-0.1# confidence level# ============================================# Simulated random variable x <-rnorm(n, mean = mu, sd =sqrt(sigma2))# Sample meanmu_hat <-mean(x)# Corrected sample variances2_hat <- (mean(x^2) - mu_hat^2) * n / (n -1)# Statistic T (1)T_1 <-sqrt(n) * (mu_hat - mu_0[1]) /sqrt(s2_hat)# Statistic T (2)T_2 <-sqrt(n) * (mu_hat - mu_0[2]) /sqrt(s2_hat)# Statistic T (3)T_3 <-sqrt(n) * (mu_hat - mu_0[3]) /sqrt(s2_hat)# Critical valueq_alpha <-qt(1-alpha, df = n -1)
0.1
2.3
-2.814
1.283
Not-rejected
0.1
2.2
-1.709
1.283
Not-rejected
0.1
2.1
-0.604
1.283
Not-rejected
Table 12.6: Left-tailed t-tests on the mean of a Normal sample.
Figure 12.6: Right-tailed test on the mean.
Welch, B. L. 1938. “The Significance of the Difference Between Two Means When the Population Variances Are Unequal.”Biometrika 29 (3/4): 350–62. https://doi.org/10.1093/biomet/29.3-4.350.
———. 1947. “The Generalization of "Student’s" Problem When Several Different Population Variances Are Involved.”Biometrika 34 (1-2): 28–35. https://doi.org/10.1093/biomet/34.1-2.28.