16 Generalized least squares

References: Chapter 3. Gardini A. (2007).

16.1 Working hypothesis

The assumptions of the generalized least squares estimator are:

The linear model approximate the conditional expectation, i.e. $E {Y_{i} ∣ x_{i}} = x_{i}^{⊤} b$ .
The conditional variance of the response variable $Y$ depends on the observation $i$ , i.e. $V {Y_{i} ∣ x_{i}} = σ_{i}^{2}$ with $0 < σ_{i}^{2} < \infty$ for all $i$ with $i = 1, \dots, n$ .
The response variables $Y$ are correlated $C v {Y_{i}, Y_{j} ∣ x_{i}} = σ_{i j}$ for all $i \neq j$ and $i, j = 1, \dots, n$ .

Equivalently the formulation of the assumptions in terms of the stochastic component $u$ are

The residuals have mean zero, i.e. $E {u_{i} ∣ x_{i}} = 0$ for all $i$ with $i = 1, \dots, n$ .
The conditional variance of the residuals depends on the observation $i$ , i.e. $V {u_{i} ∣ x_{i}} = σ_{i}^{2}$ with $0 < σ_{i}^{2} < \infty$ .
The residuals are correlated, i.e. $C v {u_{i}, u_{j} ∣ x_{i}} = σ_{i j}$ for all $i \neq j$ and $i, j = 1, \dots, n$ .

In this case the variance covariance matrix $Σ$ is defined as in Equation 14.15 and contains the variances and the covariances between the observations.

16.2 Generalized least squares estimator

Proposition 16.1 ( $Generalized Least Squares (GLS)$ )
The generalized least squares estimator (GLS) is the function $Q^{GLS}$ that minimize the weighted sum of the squared residuals and return an estimate of the true parameter $b$ , i.e. $\begin{matrix} (16.1) & Q^{GLS} (b) = \hat{u} (b)^{⊤} Σ^{- 1} \hat{u} (b) . \end{matrix}$ Formally, the GLS estimator is the solution of the following minimization problem, i.e. $\begin{matrix} (16.2) & b^{GLS} = \underset{b \in Θ_{b}}{argmin} {Q^{GLS} (b)} . \end{matrix}$ Notably, if $X$ and $Σ$ are non-singular one obtain an analytic expression, i.e.
$\begin{matrix} (16.3) & b^{GLS} = (X^{⊤} Σ^{- 1} X)^{- 1} X^{⊤} Σ^{- 1} y . \end{matrix}$

Singularity of

X

Σ

The solution is available if and only if $X$ and $Σ$ are non-singular. In practice the conditions are:

$rank (Σ) = n$ for the inversion of $Σ$ .
$rank (X) = k$ and condition 1. for the inversion of $X^{⊤} Σ^{- 1} X$ .

Proof: Proposition 16.1

Proof. Let’s prove the optimal solution in Proposition 16.1. Developing the optimization problem in Equation 16.1: $\begin{aligned} Q^{GLS} (b) & = \hat{u} (b)^{⊤} Σ^{- 1} \hat{u} (b) = \\ = (y - X b)^{⊤} Σ^{- 1} (y - X b) = \\ = y^{⊤} Σ^{- 1} y - 2 b^{⊤} X^{⊤} Σ^{- 1} y + b^{⊤} X^{⊤} Σ^{- 1} X b \end{aligned}$ In order to minimize the above expression, let’s compute the first derivative of $Q^{GLS} (b)$ with respect to $b$ $\frac{d Q^{GLS} (b)}{d b} = - 2 X^{⊤} Σ^{- 1} y + 2 X^{⊤} Σ^{- 1} X b .$ Then, setting the above expression equal to zero and solving for $b = b^{GLS}$ gives the solution, i.e. $b^{GLS} = (X^{⊤} Σ^{- 1} X)^{- 1} X^{⊤} Σ^{- 1} y .$

Proposition 16.2 ( $Two-stage derivation of GLS estimator$ )
The GLS estimator in Equation 16.3 can be equivalently recovered as $b^{GLS} = (X^{⊤} T^{⊤} T X)^{- 1} X^{⊤} T^{⊤} T y,$ where $T = Λ^{- 1 / 2} e^{⊤}$ with $Σ = e Λ e^{⊤}$ and

$Λ$ is the diagonal matrix containing the eigenvalues of $Σ$ .
$e$ is the matrix with the eigenvectors of $Σ$ that satisfies the following relation, i.e. $e^{⊤} e = e e^{⊤} = J_{n}$ .

Moreover, the matrix $T = Λ^{- 1 / 2} e^{⊤}$ satisfies the product: $\begin{matrix} (16.4) & T^{⊤} T = e Λ^{- 1 / 2} Λ^{- 1 / 2} e^{⊤} = e Λ^{- 1} e^{⊤} = Σ^{- 1} \end{matrix}$

Proof: Proposition 16.2

Proof. Let’s consider a linear model of the form $y = X b + u,$ and let’s apply some (unknown) transformation matrix $T_{n \times n}$ by multiplying on both sides, i.e. $\begin{aligned} T y = T X b + T u \\ ⇓ ⇓ ⇓ \\ \tilde{y} = \tilde{X} b + \tilde{u} \end{aligned}$ In this context, the conditional expectation of $\tilde{y}$ reads $E {\tilde{y} ∣ \tilde{X}} = \tilde{X} b,$ while it’s conditional variance $V {\tilde{y} ∣ \tilde{X}} = V {\tilde{u} ∣ \tilde{X}} = T Σ T^{⊤} .$ The next step is to identify a suitable transformation matrix $T$ such that the conditional variance became equal to the identity matrix (Equation 31.3), i.e. $V {\tilde{u} ∣ \tilde{X}} = I_{n} .$ In this way it is possible to work under the Gauss-Markov assumptions (Theorem 15.1) obtaining an estimator with minimum variance.

A possible way to identify $T$ is to decompose the variance-covariance matrix (Equation 14.15) as follows $Σ = e Λ e^{⊤} Σ^{- 1} = e Λ^{- 1} e^{⊤}$ where $Λ$ is the diagonal matrix containing the eigenvalues and $e$ is the matrix with the eigenvectors that satisfy the following relation, i.e. $e^{⊤} e = e e^{⊤} = I_{n, n}$ .

Thus, for the particular choice of $T = Λ^{- 1 / 2} e^{⊤}$ , one obtain a conditional variance equal to 1 for all the observations, i.e.
$\begin{aligned} V {\tilde{u} ∣ \tilde{X}} & = T Σ T^{⊤} = \\ = (Λ^{- 1 / 2} e^{⊤}) e Λ e^{⊤} (e Λ^{- 1 / 2}) = \\ = Λ^{- 1 / 2} Λ Λ^{- 1 / 2} = J_{n} \end{aligned}$ where $J_{n, n}$ reads as in Equation 31.2. Finally, substituting $\tilde{X} = T X$ in the OLS formula (Equation 15.3) and using the result Equation 16.4 one obtain exactly the GLS estimator in Equation 16.3, i.e. $\begin{aligned} b^{GLS} & = ({\tilde{X}}^{⊤} \tilde{X})^{- 1} {\tilde{X}}^{⊤} \tilde{y} = \\ = (X^{⊤} T^{⊤} T X)^{- 1} X^{⊤} T^{⊤} T y = \\ = (X^{⊤} Σ^{- 1} X)^{- 1} X^{⊤} Σ^{- 1} y \end{aligned}$

16.3 Properties GLS

Theorem 16.1 ( $Aikten theorem$ )
Under the following working hypothesis, also called Aikten hypothesis, i.e.

$y = X b + u$ .
$E {u} = 0$ .
$E {u u^{⊤}} = Σ$ , i.e. heteroskedastic and correlated errors.
$X$ is non-stochastic and independent from the errors $u$ for all $n$ ’s.

The Generalized Least Square (GLS) estimator is $BLUE$ (Best Linear Unbiased Estimator), where “best” stands for the estimator with minimum variance in the class of linear unbiased estimators of $b$ .

Proposition 16.3 ( $Properties GLS estimator$ )
1. Unbiased: $b^{GLS}$ is correct and it’s conditional expectation is equal to true parameter in population, i.e. $\begin{matrix} (16.5) & E {b^{GLS} ∣ X} = b . \end{matrix}$

Linear in the sense that it can be written as a linear combination of $y$ and $X$ , i.e. $b^{GLS} = A_{x} y$ , where $A_{x}$ do not depend on $y$ , i.e. $\begin{matrix} (16.6) & b^{GLS} = A_{x} y A_{x} = (X^{⊤} Σ^{- 1} X)^{- 1} X^{⊤} Σ^{- 1} . \end{matrix}$
Under the Aikten hypothesis (Theorem 16.1) it has minimum variance in the class of the unbiased linear estimators and it reads: $\begin{matrix} (16.7) & V {b^{GLS} ∣ X} = (X^{⊤} Σ^{- 1} X)^{- 1} . \end{matrix}$

Proof: Proposition 16.3

Proof. The GLS estimator is correct. It’s expected value is computed from Equation 16.3 and substituting Equation 14.11, is equal to the true parameter in population, i.e.
$\begin{matrix} (16.8) & \begin{aligned} E {b^{GLS} ∣ X} & = E {(X^{⊤} Σ^{- 1} X)^{- 1} X^{⊤} Σ^{- 1} y} = \\ = E {(X^{⊤} Σ^{- 1} X)^{- 1} X^{⊤} Σ^{- 1} (X b + u)} = \\ = (X^{⊤} Σ^{- 1} X)^{- 1} X^{⊤} Σ^{- 1} X b + (X^{⊤} Σ^{- 1} X)^{- 1} X^{⊤} Σ^{- 1} E {u ∣ X} = \\ = b \end{aligned} \end{matrix}$

Under the assumption of heteroskedastic and correlated observations the conditional variance of $b^{GLS}$ follows similarly as for the OLS case (Equation 15.12) but with $V {u ∣ X} = Σ$ , i.e. $\begin{matrix} (16.9) & \begin{aligned} V {b^{GLS} ∣ X} & = (X^{⊤} X)^{- 1} X^{⊤} Σ^{- 1} V {u ∣ X} Σ^{- 1} X (X^{⊤} X)^{- 1} = \\ = (X^{⊤} Σ^{- 1} X)^{- 1} X^{⊤} Σ^{- 1} Σ Σ^{- 1} X (X^{⊤} Σ^{- 1} X)^{- 1} = \\ = (X^{⊤} Σ^{- 1} X)^{- 1} X^{⊤} Σ^{- 1} X (X^{⊤} Σ^{- 1} X)^{- 1} = \\ = (X^{⊤} Σ^{- 1} X)^{- 1} \end{aligned} \end{matrix}$ where Equation 15.10 become a special case of Equation 16.9 where $Σ = σ_{u}^{2} I_{n}$ .