References: Chapter 3. Gardini A. (2007).
Working hypothesis
The assumptions of the generalized least squares estimator are:
- The linear model approximates the conditional expectation, i.e. \(\mathbb{E}\{Y_i \mid \mathbf{x}_i\} = \mathbf{x}_i^{\top} \mathbf{b}\).
- The conditional variance of the response variable \(Y\) depends on the observation \(i\), i.e. \(\mathbb{V}\{Y_i \mid \mathbf{x}_i\} = \sigma_i^2\) with \(0 < \sigma^2_i < \infty\) for all \(i\) with \(i = 1, \dots, n\).
- The response variables \(Y\) are correlated, i.e. \(\mathbb{C}v\{Y_i, Y_j \mid \mathbf{x}_i\} = \sigma_{ij}\) for all \(i \neq j\) and \(i,j = 1, \dots, n\).
Equivalently, the formulation of the assumptions in terms of the stochastic component \(\mathbf{u}\) is
- The residuals have mean zero, i.e. \(\mathbb{E}\{u_i \mid \mathbf{x}_i\} = 0\) for all \(i\) with \(i = 1, \dots, n\).
- The conditional variance of the residuals depends on the observation \(i\), i.e. \(\mathbb{V}\{u_i \mid \mathbf{x}_i\} = \sigma_i^2\) with \(0 < \sigma^2_i < \infty\).
- The residuals are correlated, i.e. \(\mathbb{C}v\{u_i, u_j \mid \mathbf{x}_i\} = \sigma_{ij}\) for all \(i \neq j\) and \(i,j = 1, \dots, n\).
In this case, the variance-covariance matrix \(\boldsymbol{\Sigma}\) is defined as in Equation 14.15 and contains the variances and the covariances between the observations.
Generalized least squares estimator
Proposition 16.1 (Generalized Least Squares (GLS)) The generalized least squares estimator (GLS) is the function \(\text{Q}^{\tiny\text{GLS}}\) that minimizes the weighted sum of the squared residuals and returns an estimate of the true parameter \(\mathbf{b}\), i.e. \[
\text{Q}^{\tiny\text{GLS}}(\mathbf{b}) =
\hat{\mathbf{u}}(\mathbf{b})^{\top}
\boldsymbol{\Sigma}^{-1}
\hat{\mathbf{u}}(\mathbf{b})
\text{.}
\tag{16.1}\] Formally, the GLS estimator is the solution of the following minimization problem, i.e. \[
\mathbf{b}^{\tiny\text{GLS}} =
\underset{\mathbf{b} \in \Theta_{\mathbf{b}}}{\text{argmin}} \left\{\text{Q}^{\tiny\text{GLS}}(\mathbf{b})\right\}
\text{.}
\tag{16.2}\] Notably, if \(\mathbf{X}\) and \(\boldsymbol{\Sigma}\) are non-singular, one obtains an analytic expression, i.e. \[
\mathbf{b}^{\tiny\text{GLS}} = (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1} \mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{y}
\text{.}
\tag{16.3}\]
The solution is available if and only if \(\mathbf{X}\) and \(\boldsymbol{\Sigma}\) are non-singular. In practice, the conditions are:
- \(\text{rank}(\boldsymbol{\Sigma}) = n\) for the inversion of \(\boldsymbol{\Sigma}\).
- \(\text{rank}(\mathbf{X}) = k\) and condition 1. for the inversion of \(\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X}\).
Proof. Let’s prove the optimal solution in Proposition 16.1. Developing the optimization problem in Equation 16.1: \[
\begin{aligned}
\text{Q}^{\tiny\text{GLS}}(\mathbf{b}) {} & =
\hat{\mathbf{u}}(\mathbf{b})^{\top}
\boldsymbol{\Sigma}^{-1}
\hat{\mathbf{u}}(\mathbf{b}) = \\
& = (\mathbf{y} - \mathbf{X} \mathbf{b})^{\top} \boldsymbol{\Sigma}^{-1} (\mathbf{y} - \mathbf{X} \mathbf{b}) = \\
& = \mathbf{y}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{y} - 2 \mathbf{b}^{\top} \mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{y} + \mathbf{b}^{\top} \mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1}\mathbf{X} \mathbf{b}
\end{aligned}
\] In order to minimize the above expression, let’s compute the first derivative of \(\text{Q}^{\tiny\text{GLS}}(\mathbf{b})\) with respect to \(\mathbf{b}\) \[
\frac{d\text{Q}^{\tiny\text{GLS}}(\mathbf{b})}{d\mathbf{b}} = - 2 \mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{y} + 2 \mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X} \mathbf{b}
\text{.}
\] Then, setting the above expression equal to zero and solving for \(\mathbf{b} = \mathbf{b}^{\tiny\text{GLS}}\) gives the solution, i.e. \[
\mathbf{b}^{\tiny\text{GLS}} = (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1}\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{y}
\text{.}
\]
Proposition 16.2 (Two-stage derivation of GLS estimator) The GLS estimator in Equation 16.3 can be equivalently recovered as \[
\mathbf{b}^{\tiny \text{GLS}} = (\mathbf{X}^{\top} \mathbf{T}^{\top} \mathbf{T} \mathbf{X})^{-1} \mathbf{X}^{\top} \mathbf{T}^{\top} \mathbf{T} \mathbf{y}
\text{,}
\] where \(\mathbf{T} = \boldsymbol{\Lambda}^{-1/2} \mathbf{e}^{\top}\) with \(\boldsymbol{\Sigma} = \mathbf{e} \boldsymbol{\Lambda} \mathbf{e}^{\top}\) and
- \(\boldsymbol{\Lambda}\) is the diagonal matrix containing the eigenvalues of \(\boldsymbol{\Sigma}\).
- \(\mathbf{e}\) is the matrix with the eigenvectors of \(\boldsymbol{\Sigma}\) that satisfies the following relation, i.e. \(\mathbf{e}^{\top} \mathbf{e} = \mathbf{e} \mathbf{e}^{\top} = \textbf{J}_{n}\).
Moreover, the matrix \(\mathbf{T} = \boldsymbol{\Lambda}^{-1/2} \mathbf{e}^{\top}\) satisfies the product: \[
\mathbf{T}^{\top} \mathbf{T} = \mathbf{e} \, \boldsymbol{\Lambda}^{-1/2} \boldsymbol{\Lambda}^{-1/2} \mathbf{e}^{\top} = \mathbf{e} \, \boldsymbol{\Lambda}^{-1} \mathbf{e}^{\top} = \boldsymbol{\Sigma}^{-1}
\tag{16.4}\]
Proof. Let’s consider a linear model of the form \[
\mathbf{y} = \mathbf{X}\mathbf{b} + \mathbf{u}
\text{,}
\] and let’s apply some (unknown) transformation matrix \(\mathbf{T}_{n\times n}\) by multiplying on both sides, i.e. \[
\begin{aligned}
& {} {\color{red}{\mathbf{T} \, \mathbf{y}}} = {\color{blue}{\mathbf{T} \mathbf{X}}} \, \mathbf{b} + {\color{orange}{\mathbf{T} \mathbf{u}}} \\
& \Downarrow \quad\quad\ \Downarrow \quad\quad\;\;\Downarrow \\
& \; {\color{red}{\tilde{\mathbf{y}}}} \;\;\,= \; \;{\color{blue}{\tilde{\mathbf{X}}}} \, \mathbf{b} \; + \; {\color{orange}{\tilde{\mathbf{u}}}}
\end{aligned}
\] In this context, the conditional expectation of \(\tilde{\mathbf{y}}\) reads \[
\mathbb{E}\{\tilde{\mathbf{y}}\mid \tilde{\mathbf{X}}\} = \tilde{\mathbf{X}} \mathbf{b}
\text{,}
\] while its conditional variance reads \[
\mathbb{V}\{\tilde{\mathbf{y}}\mid \tilde{\mathbf{X}}\} = \mathbb{V}\{\tilde{\mathbf{u}}\mid \tilde{\mathbf{X}}\} = \mathbf{T}\, \boldsymbol{\Sigma} \, \mathbf{T}^{\top}
\text{.}
\] The next step is to identify a suitable transformation matrix \(\mathbf{T}\) such that the conditional variance becomes equal to the identity matrix (Equation 31.3), i.e. \[
\mathbb{V}\{\tilde{\mathbf{u}}\mid \tilde{\mathbf{X}}\} = \textbf{I}_n
\text{.}
\] In this way, it is possible to work under the Gauss-Markov assumptions (Theorem 15.1), obtaining an estimator with minimum variance.
A possible way to identify \(\mathbf{T}\) is to decompose the variance-covariance matrix (Equation 14.15) as follows \[
\boldsymbol{\Sigma} = \mathbf{e} \boldsymbol{\Lambda} \mathbf{e}^{\top}
\quad
\boldsymbol{\Sigma}^{-1} = \mathbf{e} \boldsymbol{\Lambda}^{-1} \mathbf{e}^{\top}
\] where \(\boldsymbol{\Lambda}\) is the diagonal matrix containing the eigenvalues and \(\mathbf{e}\) is the matrix with the eigenvectors that satisfy the following relation, i.e. \(\mathbf{e}^{\top} \mathbf{e} = \mathbf{e} \mathbf{e}^{\top} = \textbf{I}_{n, n}\).
Thus, for the particular choice of \(\mathbf{T} = \boldsymbol{\Lambda}^{-1/2} \mathbf{e}^{\top}\), one obtains a conditional variance equal to 1 for all the observations, i.e. \[
\begin{aligned}
\mathbb{V}\{\tilde{\mathbf{u}}\mid \tilde{\mathbf{X}}\} & {} = \mathbf{T} \, \boldsymbol{\Sigma} \, \mathbf{T}^{\top} = \\
& = (\boldsymbol{\Lambda}^{-1/2} \, \mathbf{e}^{\top}) \, \mathbf{e} \, \boldsymbol{\Lambda} \, \mathbf{e}^{\top} \, (\mathbf{e} \, \boldsymbol{\Lambda}^{-1/2}) = \\
& = \boldsymbol{\Lambda}^{-1/2} \boldsymbol{\Lambda} \, \boldsymbol{\Lambda}^{-1/2} = \textbf{J}_{n}
\end{aligned}
\] where \(\mathbf{J}_{n,n}\) reads as in Equation 31.2. Finally, substituting \(\tilde{\mathbf{X}} = \mathbf{T} \mathbf{X}\) in the OLS formula (Equation 15.3) and using the result Equation 16.4, one obtains exactly the GLS estimator in Equation 16.3, i.e. \[
\begin{aligned}
\mathbf{b}^{\tiny \text{GLS}} & {} = (\tilde{\mathbf{X}}^{\top} \tilde{\mathbf{X}})^{-1} \tilde{\mathbf{X}}^{\top} \tilde{\mathbf{y}} = \\
& = (\mathbf{X}^{\top} \mathbf{T}^{\top} \mathbf{T} \mathbf{X})^{-1} \mathbf{X}^{\top} \mathbf{T}^{\top} \mathbf{T} \mathbf{y} = \\
& = (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1} \mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{y}
\end{aligned}
\]
Properties GLS
Theorem 16.1 (Aitken theorem) Under the following working hypotheses, also called Aitken hypotheses, i.e.
- \(\mathbf{y} = \mathbf{X} \mathbf{b} + \mathbf{u}\).
- \(\mathbb{E}\{\mathbf{u}\} = 0\).
- \(\mathbb{E}\{\mathbf{u} \mathbf{u}^{\top}\} = \boldsymbol{\Sigma}\), i.e. heteroskedastic and correlated errors.
- \(\mathbf{X}\) is non-stochastic and independent from the errors \(\mathbf{u}\) for all \(n\)’s.
The Generalized Least Squares (GLS) estimator is \({\color{blue}{\textbf{BLUE}}}\) (Best Linear Unbiased Estimator), where “best” stands for the estimator with minimum variance in the class of linear unbiased estimators of \(\mathbf{b}\).
Proposition 16.3 (Properties GLS estimator)
Unbiased: \(\mathbf{b}^{\tiny\text{GLS}}\) is correct and its conditional expectation is equal to the true parameter in the population, i.e. \[
\mathbb{E}\{\mathbf{b}^{\tiny\text{GLS}}\mid \mathbf{X}\} = \mathbf{b}
\text{.}
\tag{16.5}\]
Linear in the sense that it can be written as a linear combination of \(\mathbf{y}\) and \(\mathbf{X}\), i.e. \(\mathbf{b}^{\tiny\text{GLS}} = \mathbf{A}_{\text{x}} \mathbf{y}\), where \(\mathbf{A}_{\text{x}}\) does not depend on \(\mathbf{y}\), i.e. \[
\mathbf{b}^{\tiny\text{GLS}} = \mathbf{A}_{\text{x}} \mathbf{y} \quad \mathbf{A}_{\text{x}} = (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1}\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1}
\text{.}
\tag{16.6}\]
Under the Aitken hypotheses (Theorem 16.1) it has minimum variance in the class of the unbiased linear estimators and it reads: \[
\mathbb{V}\{\mathbf{b}^{\tiny\text{GLS}}\mid \mathbf{X}\} = (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1}
\text{.}
\tag{16.7}\]
Proof. The GLS estimator is correct. Its expected value is computed from Equation 16.3 and, substituting Equation 14.11, is equal to the true parameter in the population, i.e. \[
\begin{aligned}
\mathbb{E}\{\mathbf{b}^{\tiny\text{GLS}}\mid \mathbf{X}\} {} & = \mathbb{E}\{(\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1}\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{y} \} = \\
& = \mathbb{E}\{(\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1}\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} (\mathbf{X} \mathbf{b} + \mathbf{u}) \} = \\
& = (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1}\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X} \mathbf{b} + (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1}\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbb{E}\{\mathbf{u}\mid \mathbf{X}\} = \\
& = \mathbf{b}
\end{aligned}
\]
Under the assumption of heteroskedastic and correlated observations, the conditional variance of \(\mathbf{b}^{\tiny\text{GLS}}\) follows similarly as for the OLS case (Equation 15.12) but with \(\mathbb{V}\{\mathbf{u} \mid \mathbf{X} \} = \boldsymbol{\Sigma}\), i.e. \[
\begin{aligned}
\mathbb{V}\{\mathbf{b}^{\tiny\text{GLS}}\mid \mathbf{X}\} {} & = (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1} \mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} {\color{red}{\mathbb{V}\{\mathbf{u}\mid \mathbf{X} \}}} \, \boldsymbol{\Sigma}^{-1} \mathbf{X} (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1} = \\
& = (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1}\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} {\color{red}{\boldsymbol{\Sigma}}} \boldsymbol{\Sigma}^{-1} \mathbf{X} (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1} = \\
& = (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1}\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X} (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1} = \\
& = (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1}
\end{aligned}
\] where Equation 15.10 becomes a special case of Equation 16.7 where \(\boldsymbol{\Sigma} = \sigma_{\text{u}}^2 \textbf{I}_n\).
Gardini A., Costa M., Cavaliere G. 2007.
Econometria, Volume Primo. FrancoAngeli.
https://cris.unibo.it/handle/11585/119378.