library(dplyr)library(ggplot2)# ================== Setups ==================n <-500# number of simulations set.seed(1) # random seed mu_0 <-2.4# H0 mean mu_true <-2# true meanalpha <-0.05# confidence level# ============================================# Simulated random variable x <-rnorm(n, mean = mu_true, sd =4)# Grid of points for pdfx_limits <-c(-4,4)x_grid <-seq(x_limits[1], x_limits[2], by =0.01)x_breaks <-seq(x_limits[1], x_limits[2], by =1)
A statistical hypothesis is a claim about the value of a parameter or population characteristic. In any hypothesis-testing problem, there are always two competing hypotheses under consideration
The null hypothesis representing the status quo.
The alternative hypothesis representing the research.
The objective of hypothesis testing is to decide, based on sample information, if the alternative hypotheses is actually supported by the data. One usually do new research to challenge the existing beliefs.
Is there strong evidence for the alternative?
Let’s consider that you want to establish if the null hypothesis is not supported by the data. One usually assume to work under , then if the sample does not strongly contradict H0, we will continue to believe in the plausibility of the null hypothesis. There are only two possible conclusions: Reject or Fail to reject .
Definition 23.1 The test statistic is a function of a sample and is used to make a decision about whether the null hypothesis should be rejected or not. In theory, there are an infinite number of possible tests that could be devised. The choice of a particular test procedure must be based on the probability the test will produce incorrect results. In general, two kind of errors are related with test statistics, i.e.
A type I error is when the null hypothesis is rejected, but it is true.
A type II error is not rejecting the null when it is false.
The p-value is in general related to the probability of the type I error. So, the smaller the P-value, the more evidence there is in the sample data against the null hypothesis and for the alternative hypothesis.
In general, before performing a test one establish a significance level (the desired type I error probability), that defines the rejection region. Then the decision rule is: The p-value can be thought of as the smallest significance level at which can be rejected and the calculation of the P-value depends on whether the test is upper, lower, or two-tailed.
For example, let’s consider a sample of data. Then, a statistical test consists of the following:
an assumption about the distribution of the data, often expressed in terms of a statistical model ;
a null hypothesis and an alternative hypothesis which make specific statements about the data;
a test statistic which is a function of the data and whose distribution under the null hypothesis is known;
a significance level which imposes an upper bound on the probability of rejecting , given that is true.
The general procedure for a statistical hypothesis test can be summarized as follows:
Inputs: consider a null hypothesis and the significance level .
Critical value: compute the value that determine the partitions the set of possible values of into rejection and non rejection regions.
Output: compare the observed test statistic computed on the sample with the critical value . If it is in the rejection region, is rejected in favor of . Otherwise, the test fails to reject .
Step
Description
Inputs
, .
Critical value
Critical level
Output
Rejection or not depending on
In general, two kind of tests are available:
A two-tailed test is appropriate if the estimated value is greater or less than a certain range of values, for example, whether a test taker may score above or below a specific range of scores.
A one-tailed test is appropriate if the estimated value may depart from the reference value in only one direction, left or right, but not both.
23.1 Left and right tailed tests
For example, let’s simulate a sample of observations from a normal distribution (i.e. ) and consider the following sets of hypothesis, i.e. The statistic test is defined as Since it is a two-tailed test the critical value for a significance level , denoted as , is such that: where and are respectively the quantile and distribution functions of a Student-. If the statistic test , then we reject and so the mean of the sample is significantly different from 2.4. More precisely, with , the critical value of a Student-t with 499 degrees of freedom is .
# Area left tail x_left <- x_grid[x_grid < z_left[1]]y_left <-dt(x_left, df = n-1)# Area right tail x_right <- x_grid[x_grid > z_right[1]]y_right <-dt(x_right, df = n-1)# Central areax_centre <- x_grid[x_grid > z_left[1] & x_grid < z_right[1]]y_centre <-dt(x_centre, df = n-1)ggplot()+geom_segment(aes(x = z_left[1], xend = z_left[1], y =0, yend = z_left[2]), color ="red")+geom_segment(aes(x = z_right[1], xend = z_right[1], y =0, yend = z_right[2]), color ="red")+geom_ribbon(aes(x = x_centre, ymin =0, ymax = y_centre, fill ="norej"), alpha =0.3)+geom_ribbon(aes(x = x_left, ymin =0, ymax = y_left, fill ="rej"), alpha =0.3)+geom_ribbon(aes(x = x_right, ymin =0, ymax = y_right, fill ="rej"), alpha =0.3)+geom_line(aes(x_grid, pdf))+geom_point(aes(z, 0), color ="black")+scale_fill_manual(values =c(rej ="red", norej ="green"), labels =c(rej ="Rejection", norej ="No rejection")) +scale_x_continuous(breaks = x_breaks) +labs(y ="", x ="x", fill =NULL)+theme_bw()+theme(legend.position =c(.95, .95),legend.justification =c("right", "top"),legend.box.just ="right",legend.margin =margin(6, 6, 6, 6),panel.grid =element_blank())
Figure 23.1: Two-tailed test on the mean.
Let’s consider another kind of hypothesis, The statistic test do not changes, however the null hypothesis implies a left-tailed test. Hence, the critical value is is such that . Applying the quantile function of a student- we obtain: where and are respectively the quantile and distribution functions of a Student-. In this case, with , the critical value of a Student-t with 499 degrees of freedom is . Therefore, if we do not reject the null hypothesis, i.e. is greater than , otherwise we reject it and is lower than .
Left-tailed test
# Critical value left z_left <-c(qt(alpha, df = n-1), dt(qt(alpha, df = n-1), df = n-1))# Area left tail x_left <- x_grid[x_grid < z_left[1]]y_left <-dt(x_left, df = n-1)# Area right tail x_right <- x_grid[x_grid > z_left[1]]y_right <-dt(x_right, df = n-1)ggplot()+geom_segment(aes(x = z_left[1], xend = z_left[1], y =0, yend = z_left[2]), color ="red")+geom_ribbon(aes(x = x_left, ymin =0, ymax = y_left, fill ="rej"), alpha =0.3)+geom_ribbon(aes(x = x_right, ymin =0, ymax = y_right, fill ="norej"), alpha =0.3)+geom_line(aes(x_grid, pdf))+geom_point(aes(z, 0), color ="black")+scale_fill_manual(values =c(rej ="red", norej ="green"), labels =c(rej ="Rejection", norej ="No rejection")) +scale_x_continuous(breaks = x_breaks) +labs(y ="", x ="x", fill =NULL)+theme_bw()+theme_bw()+theme(legend.position =c(.95, .95),legend.justification =c("right", "top"),legend.box.just ="right",legend.margin =margin(6, 6, 6, 6),panel.grid =element_blank())
Figure 23.2: Left-tailed test on the mean.
In this case we reject the null hypothesis, hence is lower than .
Lastly, let’s consider the right-tailed case, i.e. It is always one-side test, but in this case is right-tailed. Hence, the critical value is such that where and are respectively the quantile and distribution functions of a Student-. In this case, with , the critical value of a Student-t with 499 degrees of freedom is . Therefore, if we do not reject the null hypothesis, i.e. is lower than , otherwise we reject it and is greater than . Coherently with the previous test performed in Figure 23.2, a right railed test is not rejected in Figure 23.3, hence is lower than .
Right-tailed test
# Critical value right z_right <-c(qt(1-alpha, df = n-1), dt(qt(1-alpha, df = n-1), df = n-1))# Area right tail x_left <- x_grid[x_grid > z_right[1]]y_left <-dt(x_left, df = n-1)# Area right tail x_right <- x_grid[x_grid < z_right[1]]y_right <-dt(x_right, df = n-1)ggplot()+geom_segment(aes(x = x_left[1], xend = x_left[1], y =0, yend = z_right[2]), color ="red")+geom_ribbon(aes(x = x_left, ymin =0, ymax = y_left, fill ="rej"), alpha =0.3)+geom_ribbon(aes(x = x_right, ymin =0, ymax = y_right, fill ="norej"), alpha =0.3)+geom_line(aes(x_grid, pdf))+geom_point(aes(z, 0), color ="black")+scale_fill_manual(values =c(rej ="red", norej ="green"), labels =c(rej ="Rejection area", norej ="Non rejection")) +scale_x_continuous(breaks = x_breaks) +labs(y ="", x ="x", fill =NULL)+theme_bw()+theme(legend.position =c(.95, .95),legend.justification =c("right", "top"),legend.box.just ="right",legend.margin =margin(6, 6, 6, 6),panel.grid =element_blank())
Figure 23.3: Right-tailed test on the mean.
23.2 Tests for the means
Proposition 23.1 Let’s consider the -test for the mean of a sample of identically and normally distributed random variables . Then the test statistic under is student-t distributed with degrees of freedom., i.e. where is the sample mean and the corrected sample variance. Moreover, for :
Proof. In the sample is normally distributed, the sample mean is also normally distributed, i.e. Under normality the sample variance, that is a sum of the square of independent and normally distributed random variables, follows a distribution with degrees of freedom, i.e. Notably, the ratio of a standard normal and a random variables (each one divided by the respective degrees of freedom) is exactly the definition of a Student-t random variable as in Equation 33.2. Hence, the ratio between the statistics and divided by their degrees of freedom reads The statistic test under follows a Student-t distribution with degrees of freedom. Notably, for large IID samples the statistic converges to a normal random variable independently from the distribution of .
23.2.1 Test for two means and equal variances
Let’s consider two independent Gaussian populations with equal variance, i.e. Then, let’s consider two samples of unequal size, and , with unknown means and and an equal unknown variance . Then, given the null hypothesis the test statistic is Student-t distributed with degrees of freedom and where and are the sample corrected variances (Equation 9.11) of the two samples.
23.2.2 Test for two means and unequal variances
Let’s consider two independent Gaussian populations with different variance, i.e. Then, let’s consider two samples of unequal size, and , with unknown means and and an unequal unknown variance . Then, given the null hypothesis Welch (1938) - Welch (1947) proposes a test statistic that follows approximately a Student t-distribution under the null hypothesis, but with fractional degrees of freedom computed using the Welch–Satterthwaite approximation. This is a weighted average of the degrees of freedom from each group, reflecting the uncertainty due to unequal variances, i.e. where is not necessary an integer.
23.3 Tests for the variances
23.3.1 F-test for two variances
Consider two independent normal samples, i.e. where and are the number of observations in each sample. Knowing that the sample variance is chi2 distributed (Equation 9.15) let’s define the variables: Then, since the ration of two independent divided by their respective degrees of freedom is -distributed (Equation 33.3) the statistic is defined as: Under the two variances are assumed to be equal, i.e. , thus the statistic simplifies in: This means that the null hypothesis of equal variances can be rejected when F is as extreme or more extreme than the critical value obtained from the -distribution with degrees of freedom and using a significance level .
Welch, B. L. 1938. “The Significance of the Difference Between Two Means When the Population Variances Are Unequal.”Biometrika 29 (3/4): 350–62. https://doi.org/10.1093/biomet/29.3-4.350.
———. 1947. “The Generalization of "Student’s" Problem When Several Different Population Variances Are Involved.”Biometrika 34 (1-2): 28–35. https://doi.org/10.1093/biomet/34.1-2.28.