10.1 Expectation

The expectation of a random variable X is it’s first moment, also called statistical average. In general, it is denoted as E{X}. Let’s consider a discrete random variable X with distribution function P(X=xj)=pj. Then the expectation of X is the weighted average between all the possible m-states that the random variable can assume by it’s respective probability of occurrence, i.e.  E{X}=j=1mxjpj. In the continuous case, i.e. when X takes values in R and admits a density function, the expectation is computed as an integral, i.e. E{X}=xdFX(x)=xfX(x)dx.

10.1.1 Sample statistic

Let’s consider a sample of IID observations, i.e. Xn=(x1,,xi,,xn). Then the sample expectation is computed as: μ^(Xn)=1ni=1nxi.

Population vs sample

In general, the notation Xn refers to a finite sample, e.g. μ^(Xn) is the sample mean. Instead the notation without n, i.e. X, stands for the random variable in population, e.g. E{X} is the mean in population. A population can be finite or non-finite. In the case of a finite population with N element it is useful to distinguish between:

  • Extraction with reimmission of n elements for the sample gives Nn possible combinations.
  • Extraction without readmission of n elements for the sample gives (Nn) possible combinations.
Table 10.1: Expectation in a discrete and continuous population and in a sample Xn.
Population (continuous) Population (discrete) Sample
xf(x)dx j=1mxjpj 1ni=1nxi

10.1.2 Sample moments

Let’s consider an the moments of the sample mean of an IID sample. Since all the variables has the same expected value, i.e. E{xi}=E{X}, the expected value of the sample mean is computed as: (10.1)E{μ^(Xn)}=1ni=1nE{xi}=E{X}. The variance of the sample mean is computed as: (10.2)V{μ^(Xn)}=1n2V{i=1nxi}==1n2i=1nV{xi}=V{X}n

10.1.3 Sample distribution

Proposition 10.1 Let’s consider a sample Xn of n IID random variables. If n is sufficiently large, independently from the distribution of the X, by the central limit theorem (CLT) the distribution of the sample expectation converges to the distribution of a normal random variable, i.e.  μ^(Xn)dnN(E{X},V{X}n).

Proof. In order to prove it is useful to compute the expectation and the variance of the following random variable, i.e. Sn=i=1nxi. The expectation and the variance of Sn can be easily obtained from and respectively and read: E{Sn}=nE{X}V{Sn}=nV{X} Applying the central limit theorem () one obtain: SnnE{X}nSd{X}=SnnE{X}Sd{X}nN(0,1). Hence the random variable mean μ^(Xn)=Snn on large samples is distributed as a normal random variable, i.e.  μ^(Xn)=Snn=1ni=1nxidnN(E{X},V{X}n). Note that on small sample this results holds true if and only if X is normally distributed also in population. Under normality also in population we have that independently from the sample size: XiN(E{X},V{X}),iμ^(Xn)N(E{X},V{X}n).

Distribution of sample mean
# True population moments  
true <- c(e_x = 1, v_x = 2)
# Number of elements for large samples
n <- 5000
# Number of elements for small samples
n_small <- trunc(n/30)
# Number of sample to simulate 
n_sample <- 2000

# Simulation of sample means
stat_sample_small <- c()
stat_sample_large <- c()
for(i in 1:n_sample){
  set.seed(i)
  # Large sample 
  x_n <- true[1] +  sqrt(true[2])*rnorm(n)
  # Statistic 
  stat_sample_large[i] <- mean(x_n)
  # Small sample 
  x_n <- x_n[1:n_small]
  # Statistic 
  stat_sample_small[i] <- mean(x_n)
}
(a) Small sample (166).
(b) Large sample (5000).
Figure 10.1: Distribution of the sample mean.

10.2 Variance and covariance

In general the variance of a random variable in population defined as: V{X}=E{(XE{X})2}. Let’s consider a discrete random variable X with distribution function P(X=xj)=pj. Then the variance of X is the weighted average between all the possible m-centered and squared states that the random variable can assume by it’s respective probability of occurrence, i.e.  V{X}=j=1m(xjE{X})2pj. In the continuous case, i.e. when X admits a density function and takes values in R, the expectation is computed as: V{X}=(xE{X})2fX(x)dx. Let’s consider two random variables X and Y. Then, in general their covariance is defined as: Cv{X,Y}=E{(XE{X})(YE{Y})}. In the discrete case where X and Y have a joint distribution P(X=xi,Y=yj)=pij, their covariance is defined as: Cv{X,Y}=i=1mj=1s(xiE{X})(yjE{Y})pij. In the continuous case, if the joint distribution of X and Y admits a density function the covariance is computed as: Cv{X,Y}=(xE{X})(yE{Y})fX,Y(x,y)dxdy.

10.2.1 Properties

There are several properties connected to the variance.

  1. The variance can be computed as: (10.3)V{X}=E{X2}E{X}2.
  2. The variance is invariant with respect to the addition of a constant a, i.e.  (10.4)V{a+X}=V{X}.
  3. The variance scales upon multiplication with a constant a, i.e.  (10.5)V{aX}=a2V{X}.
  4. The variance of the sum is computed as: (10.6)V{X+Y}=V{X}+V{Y}+2Cv{X,Y}.
  5. The covariance can be expressed as:
    (10.7)Cv{X,Y}=E{XY}E{X}E{Y}.
  6. The covariance scales upon multiplication with a constant a and b, i.e.  (10.8)Cv{aX,bY}=abC{X,Y}.

Proof. The property 1. () follows easily developing the definition of variance, i.e.  V{X}=E{(XE{X})2}==E{X2}+E{X}22E{X}2==E{X2}E{X}2 The property 2. () follows from the definition, i.e.  V{a+X}=E{(a+XE{a+X})2}==E{(XE{X})2}==V{X} The property 3. () follows using the expression of the variance in , i.e.  V{aX}=E{(aX)2}E{aX}2==a2E{X2}a2E{X}2==a2(E{X2}E{X}2)==a2V{X} The property 4. (), i.e. the variance of the sum of two random variables is: V{X+Y}=E{(X+YE{X+Y})2}==E{([XE{X}]+[YE{Y}])2}==E{(XE{X})2}+E{(YE{Y})2}+2E{(XE{X})(YE{Y})}==V{X}+V{Y}+2Cv{X,Y} where in the case in which there is no linear connection between X and Y the covariance is zero, i.e. Cv{X,Y}=0. Developing the computation of the covariance it is possible to prove property 5. (), i.e.  Cv{X,Y}=E{(XE{X})(YE{Y})}==E{XYXE{Y}YE{X}+E{X}E{Y}}==E{XY}2E{X}E{Y}+E{X}E{Y}==E{XY}E{X}E{Y} Finally, using the result in property 5. () the result in property 6. () follows easily: Cv{aX,bY}=E{aXbY}E{aX}E{bY}==abE{XY}abE{X}E{Y}==abCv{X,Y}

10.2.2 Conditional variance

Proposition 10.2 (Conditional variance)
Let’s consider two random variable X and Y with finite second moment. Then, the total variance can be expressed as: V{X}=E{V{XY}}+V{E{XY}}V{Y}=E{V{YX}}+V{E{YX}}

Proof. By definition, the variance of a random variable reads: V{X}=E{X2}E{X}2. Applying the tower property we can write V{X}=E{E{X2Y}}E{E{XY}}2. Then, add and subtract E{E{XY}2}, V{X}=E{E{X2Y}}E{E{XY}}2+E{E{XY}2}E{E{XY}2}. Grouping the first and fourth terms and the second and third one obtain V{X}=E{E{X2Y}E{XY}2}+E{E{XY}2}E{E{XY}}2==E{V{XY}}+V{E{XY}}.

10.2.3 Sample statistic

The sample’s variance on Xn=(x1,,xi,,xn) is computed as: (10.9)V{Xn}=σ^2(Xn)=1ni=1n(xiE{Xn})2. Equivalently, in terms of the first and second moment: (10.10)σ^2(Xn)=1ni=1nxi2(1ni=1nxi)2. In general, the variance computed as in is not correct for the population value. Hence, to correct the estimator let’s define the corrected sample’s variance: (10.11)s^2(Xn)=nn1σ^2(Xn).

10.2.4 Sample moments

Let’s consider an the moments of the sample variance on an IID sample. The expected value of the corrected sample variance: (10.12)E{s^2(Xn)}=σ2. The variance of the corrected sample variance is: (10.13)V{s^2(Xn)}=σ4n((μ4σ43)+2nn1), where μ4σ4 is the kurtosis of Xn. If the population is normal, μ4σ4=3 and the variance simplifies in: (10.14)V{s^2(Xn)}=2σ4n1.

10.2.5 Sample distribution

The distribution of the sample variance is available when we consider the sum of n-IID standard normal random variables. Notably, from Cochran’s theorem:
(10.15)Tn=(n1)s^2(Xn)σ2χ2(n1) Going to the limit as ν a χν2 random variable converges to a standard normal random variable, i.e. χ2(n)n2ndnN(0,1) therefore, on large samples the statistic Tn converges to a normal random variable, i.e.  (10.16)TndnN(n,2n)Tnn2nN(0,1)

If the population Xn is normal, then the distribution of s^2(Xn) is proportional to the distribution of a χn12. In fact, from the expectation of s^2(Xn) is: E{Tn}=(n1)E{s^2(Xn)}σ2E{s^2(Xn)}=σ2E{Tn}n1=σ2(n1)n1E{s^2(Xn)}=σ2 Similarly, computing the variance of and knowing that V{Tn}=2(n1) one obtain: V{Tn}=(n1)2V{s^2(Xn)}σ4V{s^2(Xn)}=σ4V{Tn}(n1)2=σ42(n1)(n1)2V{s^2(Xn)}=2σ4n1

Distribution of Tn under normality
# True population moments  
true <- c(e_x = 1, v_x = 2)
# Number of elements for large samples
n <- 5000
# Number of elements for small samples
n_small <- trunc(n/30)
# Number of sample to simulate 
n_sample <- 2000

# Simulation of sample variance
stat_sample_small <- c()
stat_sample_large <- c()
for(i in 1:n_sample){
  set.seed(i)
  # Large sample 
  x_n <- true[1] + sqrt(true[2])*rnorm(n)
  # Statistic 
  stat_sample_large[i] <- (1/(n-1))*sum((x_n - mean(x_n))^2)
  # Small sample 
  x_n <- x_n[1:n_small]
  # Statistic 
  stat_sample_small[i] <- (1/(n_small-1))*sum((x_n - mean(x_n))^2)
}
(a) Small sample (166).
(b) Large sample (5000).
Figure 10.2: Distribution of the statistic Tn under normality.

10.3 Skewness

The skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For a uni modal distribution, negative skew commonly indicates that the tail is on the left side of the distribution, and positive skew indicates that the tail is on the right.

Figure 10.3: Skewness of a random variable.

Following the same notation as in Ralph B. D’agostino and Jr. (), let’s define and denote the population skewness of a random variable X as: Sk{X}=β1(X)=E{(XE{X}V{X})3},

10.3.1 Sample statistic

Let’s consider an IID sample Xn=(x1,,xi,,xn), then the sample’s skewness is estimated as: (10.17)Sk{Xn}=b1(Xn)=1ni=1n(xiE{Xn}V{Xn})3. The estimator in is not correct. Hence, let’s define the correct sample estimator of the skewness as: g1(Xn)=n(n1)(n2)b1(Xn).

10.3.2 Sample moments

Under normality, the asymptotic moments of the sample skewness are: E{b1(Xn)}=0,V{b1(Xn)}=6n. In Urzúa () are also reported the exact mean of the estimator in for small normal samples, i.e.
E{b1(Xn)}=0, and variance (10.18)V{b1(Xn)}=6(n2)(n+1)(n+3).

10.3.3 Sample distribution

Under normality, the asymptotic distribution of the sample skewness is normal i.e.  (10.19)b1(Xn)dnN(0,6n).

10.4 Kurtosis

The kurtosis is a measure of the tailedness of the probability distribution of a real-valued random variable. The standard measure of a distribution’s kurtosis, originating with Karl Pearson is a scaled version of the fourth moment of the distribution. This number is related to the tails of the distribution. For this measure, higher kurtosis corresponds to greater extremity of deviations from the mean (or outliers). In general, it is common to compare the excess kurtosis of a distribution with respect to the normal distribution (with kurtosis equal to 3). It is possible to distinguish 3 cases:

  1. A negative excess kurtosis or platykurtic are distributions that produces less outliers than the normal. distribution.
  2. A zero excess kurtosis or mesokurtic are distributions that produces same outliers than the normal.
  3. A positive excess kurtosis or leptokurtic are distributions that produces more outliers than the normal.
Figure 10.4: Kurtosis of a different leptokurtic distributions.

Let’s define and denote the population kurtosis of a random variable X as: Kt{X}=β2(X)=E{(XE{X}V{X})4}, or equivalently the excess kurtosis as Kt{X}3.

10.4.1 Sample statistic

Let’s consider an IID sample Xn=(x1,,xi,,xn), then the sample’s kurtosis is denoted as: (10.20)Kt{Xn}=b2(Xn)=1ni=1n(xiE{Xn}V{Xn})4. From Pearson (), we have a correct the version of b1(Xn) defined as: g2(Xn)=[b2(Xn)3(n+1)n+1](n+1)(n1)(n2)(n3).

10.4.2 Sample moments

Under normality, the asymptotic moments of the sample kurtosis are: E{b2(Xn)}=3,V{b2(Xn)}=24n. Notably in Urzúa () are reported also the exact mean and variance for a small normal sample, i.e.  (10.21)E{b2(Xn)}=3(n1)(n+1), and the variance as: (10.22)V{b2(Xn)}=24n(n2)(n3)(n+1)2(n+3)(n+5).

10.4.3 Sample distribution

Under normality, the asymptotic distribution of the sample kurtosis is normal, i.e. (10.23)b2(Xn)dnN(3,24n).