16  Generalized least squares

References: Chapter 3. Gardini A. (2007).

16.1 Working hypothesis

The assumptions of the generalized least squares estimator are:

  1. The linear model approximate the conditional expectation, i.e. \(\mathbb{E}\{Y_i \mid \mathbf{x}_i\} = \mathbf{x}_i^{\top} \mathbf{b}\).
  2. The conditional variance of the response variable \(Y\) depends on the observation \(i\), i.e. \(\mathbb{V}\{Y_i \mid \mathbf{x}_i\} = \sigma_i^2\) with \(0 < \sigma^2_i < \infty\) for all \(i\) with \(i = 1, \dots, n\).
  3. The response variables \(Y\) are correlated \(\mathbb{C}v\{Y_i, Y_j \mid \mathbf{x}_i\} = \sigma_{ij}\) for all \(i \neq j\) and \(i,j = 1, \dots, n\).

Equivalently the formulation of the assumptions in terms of the stochastic component \(\mathbf{u}\) are

  1. The residuals have mean zero, i.e. \(\mathbb{E}\{u_i \mid \mathbf{x}_i\} = 0\) for all \(i\) with \(i = 1, \dots, n\).
  2. The conditional variance of the residuals depends on the observation \(i\), i.e. \(\mathbb{V}\{u_i \mid \mathbf{x}_i\} = \sigma_i^2\) with \(0 < \sigma^2_i < \infty\).
  3. The residuals are correlated, i.e. \(\mathbb{C}v\{u_i, u_j \mid \mathbf{x}_i\} = \sigma_{ij}\) for all \(i \neq j\) and \(i,j = 1, \dots, n\).

In this case the variance covariance matrix \(\boldsymbol{\Sigma}\) is defined as in Equation 14.15 and contains the variances and the covariances between the observations.

16.2 Generalized least squares estimator

Proposition 16.1 (\(\color{magenta}{\textbf{Generalized Least Squares (GLS)}}\))
The generalized least squares estimator (GLS) is the function \(\text{Q}^{\tiny\text{GLS}}\) that minimize the weighted sum of the squared residuals and return an estimate of the true parameter \(\mathbf{b}\), i.e.  \[ \text{Q}^{\tiny\text{GLS}}(\mathbf{b}) = \hat{\mathbf{u}}(\mathbf{b})^{\top} \boldsymbol{\Sigma}^{-1} \hat{\mathbf{u}}(\mathbf{b}) \text{.} \tag{16.1}\] Formally, the GLS estimator is the solution of the following minimization problem, i.e.  \[ \mathbf{b}^{\tiny\text{GLS}} = \underset{\mathbf{b} \in \Theta_{\mathbf{b}}}{\text{argmin}} \left\{\text{Q}^{\tiny\text{GLS}}(\mathbf{b})\right\} \text{.} \tag{16.2}\] Notably, if \(\mathbf{X}\) and \(\boldsymbol{\Sigma}\) are non-singular one obtain an analytic expression, i.e.
\[ \mathbf{b}^{\tiny\text{GLS}} = (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1} \mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{y} \text{.} \tag{16.3}\]

Singularity of \(\mathbf{X}\) or \(\boldsymbol{\Sigma}\)

The solution is available if and only if \(\mathbf{X}\) and \(\boldsymbol{\Sigma}\) are non-singular. In practice the conditions are:

  1. \(\text{rank}(\boldsymbol{\Sigma}) = n\) for the inversion of \(\boldsymbol{\Sigma}\).
  2. \(\text{rank}(\mathbf{X}) = k\) and condition 1. for the inversion of \(\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X}\).

Proof. Let’s prove the optimal solution in Proposition 16.1. Developing the optimization problem in Equation 16.1: \[ \begin{aligned} \text{Q}^{\tiny\text{GLS}}(\mathbf{b}) {} & = \hat{\mathbf{u}}(\mathbf{b})^{\top} \boldsymbol{\Sigma}^{-1} \hat{\mathbf{u}}(\mathbf{b}) = \\ & = (\mathbf{y} - \mathbf{X} \mathbf{b})^{\top} \boldsymbol{\Sigma}^{-1} (\mathbf{y} - \mathbf{X} \mathbf{b}) = \\ & = \mathbf{y}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{y} - 2 \mathbf{b}^{\top} \mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{y} + \mathbf{b}^{\top} \mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1}\mathbf{X} \mathbf{b} \end{aligned} \] In order to minimize the above expression, let’s compute the first derivative of \(\text{Q}^{\tiny\text{GLS}}(\mathbf{b})\) with respect to \(\mathbf{b}\) \[ \frac{d\text{Q}^{\tiny\text{GLS}}(\mathbf{b})}{d\mathbf{b}} = - 2 \mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{y} + 2 \mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X} \mathbf{b} \text{.} \] Then, setting the above expression equal to zero and solving for \(\mathbf{b} = \mathbf{b}^{\tiny\text{GLS}}\) gives the solution, i.e. \[ \mathbf{b}^{\tiny\text{GLS}} = (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1}\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{y} \text{.} \]

Proposition 16.2 (\(\color{magenta}{\textbf{Two-stage derivation of GLS estimator}}\))
The GLS estimator in Equation 16.3 can be equivalently recovered as \[ \mathbf{b}^{\tiny \text{GLS}} = (\mathbf{X}^{\top} \mathbf{T}^{\top} \mathbf{T} \mathbf{X})^{-1} \mathbf{X}^{\top} \mathbf{T}^{\top} \mathbf{T} \mathbf{y} \text{,} \] where \(\mathbf{T} = \boldsymbol{\Lambda}^{-1/2} \mathbf{e}^{\top}\) with \(\boldsymbol{\Sigma} = \mathbf{e} \boldsymbol{\Lambda} \mathbf{e}^{\top}\) and

  • \(\boldsymbol{\Lambda}\) is the diagonal matrix containing the eigenvalues of \(\boldsymbol{\Sigma}\).
  • \(\mathbf{e}\) is the matrix with the eigenvectors of \(\boldsymbol{\Sigma}\) that satisfies the following relation, i.e. \(\mathbf{e}^{\top} \mathbf{e} = \mathbf{e} \mathbf{e}^{\top} = \textbf{J}_{n}\).

Moreover, the matrix \(\mathbf{T} = \boldsymbol{\Lambda}^{-1/2} \mathbf{e}^{\top}\) satisfies the product: \[ \mathbf{T}^{\top} \mathbf{T} = \mathbf{e} \, \boldsymbol{\Lambda}^{-1/2} \boldsymbol{\Lambda}^{-1/2} \mathbf{e}^{\top} = \mathbf{e} \, \boldsymbol{\Lambda}^{-1} \mathbf{e}^{\top} = \boldsymbol{\Sigma}^{-1} \tag{16.4}\]

Proof. Let’s consider a linear model of the form \[ \mathbf{y} = \mathbf{X}\mathbf{b} + \mathbf{u} \text{,} \] and let’s apply some (unknown) transformation matrix \(\mathbf{T}_{n\times n}\) by multiplying on both sides, i.e. \[ \begin{aligned} & {} {\color{red}{\mathbf{T} \, \mathbf{y}}} = {\color{blue}{\mathbf{T} \mathbf{X}}} \, \mathbf{b} + {\color{orange}{\mathbf{T} \mathbf{u}}} \\ & \Downarrow \quad\quad\ \Downarrow \quad\quad\;\;\Downarrow \\ & \; {\color{red}{\tilde{\mathbf{y}}}} \;\;\,= \; \;{\color{blue}{\tilde{\mathbf{X}}}} \, \mathbf{b} \; + \; {\color{orange}{\tilde{\mathbf{u}}}} \end{aligned} \] In this context, the conditional expectation of \(\tilde{\mathbf{y}}\) reads \[ \mathbb{E}\{\tilde{\mathbf{y}}\mid \tilde{\mathbf{X}}\} = \tilde{\mathbf{X}} \mathbf{b} \text{,} \] while it’s conditional variance \[ \mathbb{V}\{\tilde{\mathbf{y}}\mid \tilde{\mathbf{X}}\} = \mathbb{V}\{\tilde{\mathbf{u}}\mid \tilde{\mathbf{X}}\} = \mathbf{T}\, \boldsymbol{\Sigma} \, \mathbf{T}^{\top} \text{.} \] The next step is to identify a suitable transformation matrix \(\mathbf{T}\) such that the conditional variance became equal to the identity matrix (Equation 31.3), i.e.  \[ \mathbb{V}\{\tilde{\mathbf{u}}\mid \tilde{\mathbf{X}}\} = \textbf{I}_n \text{.} \] In this way it is possible to work under the Gauss-Markov assumptions (Theorem 15.1) obtaining an estimator with minimum variance.

A possible way to identify \(\mathbf{T}\) is to decompose the variance-covariance matrix (Equation 14.15) as follows \[ \boldsymbol{\Sigma} = \mathbf{e} \boldsymbol{\Lambda} \mathbf{e}^{\top} \quad \boldsymbol{\Sigma}^{-1} = \mathbf{e} \boldsymbol{\Lambda}^{-1} \mathbf{e}^{\top} \] where \(\boldsymbol{\Lambda}\) is the diagonal matrix containing the eigenvalues and \(\mathbf{e}\) is the matrix with the eigenvectors that satisfy the following relation, i.e. \(\mathbf{e}^{\top} \mathbf{e} = \mathbf{e} \mathbf{e}^{\top} = \textbf{I}_{n, n}\).

Thus, for the particular choice of \(\mathbf{T} = \boldsymbol{\Lambda}^{-1/2} \mathbf{e}^{\top}\), one obtain a conditional variance equal to 1 for all the observations, i.e.
\[ \begin{aligned} \mathbb{V}\{\tilde{\mathbf{u}}\mid \tilde{\mathbf{X}}\} & {} = \mathbf{T} \, \boldsymbol{\Sigma} \, \mathbf{T}^{\top} = \\ & = (\boldsymbol{\Lambda}^{-1/2} \, \mathbf{e}^{\top}) \, \mathbf{e} \, \boldsymbol{\Lambda} \, \mathbf{e}^{\top} \, (\mathbf{e} \, \boldsymbol{\Lambda}^{-1/2}) = \\ & = \boldsymbol{\Lambda}^{-1/2} \boldsymbol{\Lambda} \, \boldsymbol{\Lambda}^{-1/2} = \textbf{J}_{n} \end{aligned} \] where \(\mathbf{J}_{n,n}\) reads as in Equation 31.2. Finally, substituting \(\tilde{\mathbf{X}} = \mathbf{T} \mathbf{X}\) in the OLS formula (Equation 15.3) and using the result Equation 16.4 one obtain exactly the GLS estimator in Equation 16.3, i.e.  \[ \begin{aligned} \mathbf{b}^{\tiny \text{GLS}} & {} = (\tilde{\mathbf{X}}^{\top} \tilde{\mathbf{X}})^{-1} \tilde{\mathbf{X}}^{\top} \tilde{\mathbf{y}} = \\ & = (\mathbf{X}^{\top} \mathbf{T}^{\top} \mathbf{T} \mathbf{X})^{-1} \mathbf{X}^{\top} \mathbf{T}^{\top} \mathbf{T} \mathbf{y} = \\ & = (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1} \mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{y} \end{aligned} \]

16.3 Properties GLS

Theorem 16.1 (\(\color{magenta}{\textbf{Aikten theorem}}\))
Under the following working hypothesis, also called Aikten hypothesis, i.e. 

  1. \(\mathbf{y} = \mathbf{X} \mathbf{b} + \mathbf{u}\).
  2. \(\mathbb{E}\{\mathbf{u}\} = 0\).
  3. \(\mathbb{E}\{\mathbf{u} \mathbf{u}^{\top}\} = \boldsymbol{\Sigma}\), i.e. heteroskedastic and correlated errors.
  4. \(\mathbf{X}\) is non-stochastic and independent from the errors \(\mathbf{u}\) for all \(n\)’s.

The Generalized Least Square (GLS) estimator is \({\color{blue}{\textbf{BLUE}}}\) (Best Linear Unbiased Estimator), where “best” stands for the estimator with minimum variance in the class of linear unbiased estimators of \(\mathbf{b}\).

Proposition 16.3 (\(\color{magenta}{\textbf{Properties GLS estimator}}\))
1. Unbiased: \(\mathbf{b}^{\tiny\text{GLS}}\) is correct and it’s conditional expectation is equal to true parameter in population, i.e.  \[ \mathbb{E}\{\mathbf{b}^{\tiny\text{GLS}}\mid \mathbf{X}\} = \mathbf{b} \text{.} \tag{16.5}\]

  1. Linear in the sense that it can be written as a linear combination of \(\mathbf{y}\) and \(\mathbf{X}\), i.e. \(\mathbf{b}^{\tiny\text{GLS}} = \mathbf{A}_{\text{x}} \mathbf{y}\), where \(\mathbf{A}_{\text{x}}\) do not depend on \(\mathbf{y}\), i.e. \[ \mathbf{b}^{\tiny\text{GLS}} = \mathbf{A}_{\text{x}} \mathbf{y} \quad \mathbf{A}_{\text{x}} = (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1}\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \text{.} \tag{16.6}\]
  2. Under the Aikten hypothesis (Theorem 16.1) it has minimum variance in the class of the unbiased linear estimators and it reads: \[ \mathbb{V}\{\mathbf{b}^{\tiny\text{GLS}}\mid \mathbf{X}\} = (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1} \text{.} \tag{16.7}\]

Proof. The GLS estimator is correct. It’s expected value is computed from Equation 16.3 and substituting Equation 14.11, is equal to the true parameter in population, i.e.
\[ \begin{aligned} \mathbb{E}\{\mathbf{b}^{\tiny\text{GLS}}\mid \mathbf{X}\} {} & = \mathbb{E}\{(\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1}\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{y} \} = \\ & = \mathbb{E}\{(\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1}\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} (\mathbf{X} \mathbf{b} + \mathbf{u}) \} = \\ & = (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1}\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X} \mathbf{b} + (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1}\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbb{E}\{\mathbf{u}\mid \mathbf{X}\} = \\ & = \mathbf{b} \end{aligned} \tag{16.8}\]

Under the assumption of heteroskedastic and correlated observations the conditional variance of \(\mathbf{b}^{\tiny\text{GLS}}\) follows similarly as for the OLS case (Equation 15.12) but with \(\mathbb{V}\{\mathbf{u} \mid \mathbf{X} \} = \boldsymbol{\Sigma}\), i.e. \[ \begin{aligned} \mathbb{V}\{\mathbf{b}^{\tiny\text{GLS}}\mid \mathbf{X}\} {} & = (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} {\color{red}{\mathbb{V}\{\mathbf{u}\mid \mathbf{X} \}}} \, \boldsymbol{\Sigma}^{-1} \mathbf{X} (\mathbf{X}^{\top} \mathbf{X})^{-1} = \\ & = (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1}\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} {\color{red}{\boldsymbol{\Sigma}}} \boldsymbol{\Sigma}^{-1} \mathbf{X} (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1} = \\ & = (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1}\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X} (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1} = \\ & = (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1} \end{aligned} \tag{16.9}\] where Equation 15.10 become a special case of Equation 16.9 where \(\boldsymbol{\Sigma} = \sigma_{\text{u}}^2 \textbf{I}_n\).