References: Chapter 3. Gardini A. (2007).
Working hypothesis
The assumptions of the generalized least squares estimator are:
- The linear model approximate the conditional expectation, i.e. \(\mathbb{E}\{Y_i \mid \mathbf{x}_i\} = \mathbf{x}_i^{\top} \mathbf{b}\).
- The conditional variance of the response variable \(Y\) depends on the observation \(i\), i.e. \(\mathbb{V}\{Y_i \mid \mathbf{x}_i\} = \sigma_i^2\) with \(0 < \sigma^2_i < \infty\) for all \(i\) with \(i = 1, \dots, n\).
- The response variables \(Y\) are correlated \(\mathbb{C}v\{Y_i, Y_j \mid \mathbf{x}_i\} = \sigma_{ij}\) for all \(i \neq j\) and \(i,j = 1, \dots, n\).
Equivalently the formulation of the assumptions in terms of the stochastic component \(\mathbf{u}\) are
- The residuals have mean zero, i.e. \(\mathbb{E}\{u_i \mid \mathbf{x}_i\} = 0\) for all \(i\) with \(i = 1, \dots, n\).
- The conditional variance of the residuals depends on the observation \(i\), i.e. \(\mathbb{V}\{u_i \mid \mathbf{x}_i\} = \sigma_i^2\) with \(0 < \sigma^2_i < \infty\).
- The residuals are correlated, i.e. \(\mathbb{C}v\{u_i, u_j \mid \mathbf{x}_i\} = \sigma_{ij}\) for all \(i \neq j\) and \(i,j = 1, \dots, n\).
In this case the variance covariance matrix \(\boldsymbol{\Sigma}\) is defined as in Equation 14.15 and contains the variances and the covariances between the observations.
Generalized least squares estimator
Proposition 16.1 (\(\color{magenta}{\textbf{Generalized Least Squares (GLS)}}\))
The generalized least squares estimator (GLS) is the function \(\text{Q}^{\tiny\text{GLS}}\) that minimize the weighted sum of the squared residuals and return an estimate of the true parameter \(\mathbf{b}\), i.e. \[
\text{Q}^{\tiny\text{GLS}}(\mathbf{b}) =
\hat{\mathbf{u}}(\mathbf{b})^{\top}
\boldsymbol{\Sigma}^{-1}
\hat{\mathbf{u}}(\mathbf{b})
\text{.}
\tag{16.1}\] Formally, the GLS estimator is the solution of the following minimization problem, i.e. \[
\mathbf{b}^{\tiny\text{GLS}} =
\underset{\mathbf{b} \in \Theta_{\mathbf{b}}}{\text{argmin}} \left\{\text{Q}^{\tiny\text{GLS}}(\mathbf{b})\right\}
\text{.}
\tag{16.2}\] Notably, if \(\mathbf{X}\) and \(\boldsymbol{\Sigma}\) are non-singular one obtain an analytic expression, i.e.
\[
\mathbf{b}^{\tiny\text{GLS}} = (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1} \mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{y}
\text{.}
\tag{16.3}\]
The solution is available if and only if \(\mathbf{X}\) and \(\boldsymbol{\Sigma}\) are non-singular. In practice the conditions are:
- \(\text{rank}(\boldsymbol{\Sigma}) = n\) for the inversion of \(\boldsymbol{\Sigma}\).
- \(\text{rank}(\mathbf{X}) = k\) and condition 1. for the inversion of \(\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X}\).
Proof. Let’s prove the optimal solution in Proposition 16.1. Developing the optimization problem in Equation 16.1: \[
\begin{aligned}
\text{Q}^{\tiny\text{GLS}}(\mathbf{b}) {} & =
\hat{\mathbf{u}}(\mathbf{b})^{\top}
\boldsymbol{\Sigma}^{-1}
\hat{\mathbf{u}}(\mathbf{b}) = \\
& = (\mathbf{y} - \mathbf{X} \mathbf{b})^{\top} \boldsymbol{\Sigma}^{-1} (\mathbf{y} - \mathbf{X} \mathbf{b}) = \\
& = \mathbf{y}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{y} - 2 \mathbf{b}^{\top} \mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{y} + \mathbf{b}^{\top} \mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1}\mathbf{X} \mathbf{b}
\end{aligned}
\] In order to minimize the above expression, let’s compute the first derivative of \(\text{Q}^{\tiny\text{GLS}}(\mathbf{b})\) with respect to \(\mathbf{b}\) \[
\frac{d\text{Q}^{\tiny\text{GLS}}(\mathbf{b})}{d\mathbf{b}} = - 2 \mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{y} + 2 \mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X} \mathbf{b}
\text{.}
\] Then, setting the above expression equal to zero and solving for \(\mathbf{b} = \mathbf{b}^{\tiny\text{GLS}}\) gives the solution, i.e. \[
\mathbf{b}^{\tiny\text{GLS}} = (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1}\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{y}
\text{.}
\]
Proposition 16.2 (\(\color{magenta}{\textbf{Two-stage derivation of GLS estimator}}\))
The GLS estimator in Equation 16.3 can be equivalently recovered as \[
\mathbf{b}^{\tiny \text{GLS}} = (\mathbf{X}^{\top} \mathbf{T}^{\top} \mathbf{T} \mathbf{X})^{-1} \mathbf{X}^{\top} \mathbf{T}^{\top} \mathbf{T} \mathbf{y}
\text{,}
\] where \(\mathbf{T} = \boldsymbol{\Lambda}^{-1/2} \mathbf{e}^{\top}\) with \(\boldsymbol{\Sigma} = \mathbf{e} \boldsymbol{\Lambda} \mathbf{e}^{\top}\) and
- \(\boldsymbol{\Lambda}\) is the diagonal matrix containing the eigenvalues of \(\boldsymbol{\Sigma}\).
- \(\mathbf{e}\) is the matrix with the eigenvectors of \(\boldsymbol{\Sigma}\) that satisfies the following relation, i.e. \(\mathbf{e}^{\top} \mathbf{e} = \mathbf{e} \mathbf{e}^{\top} = \textbf{J}_{n}\).
Moreover, the matrix \(\mathbf{T} = \boldsymbol{\Lambda}^{-1/2} \mathbf{e}^{\top}\) satisfies the product: \[
\mathbf{T}^{\top} \mathbf{T} = \mathbf{e} \, \boldsymbol{\Lambda}^{-1/2} \boldsymbol{\Lambda}^{-1/2} \mathbf{e}^{\top} = \mathbf{e} \, \boldsymbol{\Lambda}^{-1} \mathbf{e}^{\top} = \boldsymbol{\Sigma}^{-1}
\tag{16.4}\]
Proof. Let’s consider a linear model of the form \[
\mathbf{y} = \mathbf{X}\mathbf{b} + \mathbf{u}
\text{,}
\] and let’s apply some (unknown) transformation matrix \(\mathbf{T}_{n\times n}\) by multiplying on both sides, i.e. \[
\begin{aligned}
& {} {\color{red}{\mathbf{T} \, \mathbf{y}}} = {\color{blue}{\mathbf{T} \mathbf{X}}} \, \mathbf{b} + {\color{orange}{\mathbf{T} \mathbf{u}}} \\
& \Downarrow \quad\quad\ \Downarrow \quad\quad\;\;\Downarrow \\
& \; {\color{red}{\tilde{\mathbf{y}}}} \;\;\,= \; \;{\color{blue}{\tilde{\mathbf{X}}}} \, \mathbf{b} \; + \; {\color{orange}{\tilde{\mathbf{u}}}}
\end{aligned}
\] In this context, the conditional expectation of \(\tilde{\mathbf{y}}\) reads \[
\mathbb{E}\{\tilde{\mathbf{y}}\mid \tilde{\mathbf{X}}\} = \tilde{\mathbf{X}} \mathbf{b}
\text{,}
\] while it’s conditional variance \[
\mathbb{V}\{\tilde{\mathbf{y}}\mid \tilde{\mathbf{X}}\} = \mathbb{V}\{\tilde{\mathbf{u}}\mid \tilde{\mathbf{X}}\} = \mathbf{T}\, \boldsymbol{\Sigma} \, \mathbf{T}^{\top}
\text{.}
\] The next step is to identify a suitable transformation matrix \(\mathbf{T}\) such that the conditional variance became equal to the identity matrix (Equation 31.3), i.e. \[
\mathbb{V}\{\tilde{\mathbf{u}}\mid \tilde{\mathbf{X}}\} = \textbf{I}_n
\text{.}
\] In this way it is possible to work under the Gauss-Markov assumptions (Theorem 15.1) obtaining an estimator with minimum variance.
A possible way to identify \(\mathbf{T}\) is to decompose the variance-covariance matrix (Equation 14.15) as follows \[
\boldsymbol{\Sigma} = \mathbf{e} \boldsymbol{\Lambda} \mathbf{e}^{\top}
\quad
\boldsymbol{\Sigma}^{-1} = \mathbf{e} \boldsymbol{\Lambda}^{-1} \mathbf{e}^{\top}
\] where \(\boldsymbol{\Lambda}\) is the diagonal matrix containing the eigenvalues and \(\mathbf{e}\) is the matrix with the eigenvectors that satisfy the following relation, i.e. \(\mathbf{e}^{\top} \mathbf{e} = \mathbf{e} \mathbf{e}^{\top} = \textbf{I}_{n, n}\).
Thus, for the particular choice of \(\mathbf{T} = \boldsymbol{\Lambda}^{-1/2} \mathbf{e}^{\top}\), one obtain a conditional variance equal to 1 for all the observations, i.e.
\[
\begin{aligned}
\mathbb{V}\{\tilde{\mathbf{u}}\mid \tilde{\mathbf{X}}\} & {} = \mathbf{T} \, \boldsymbol{\Sigma} \, \mathbf{T}^{\top} = \\
& = (\boldsymbol{\Lambda}^{-1/2} \, \mathbf{e}^{\top}) \, \mathbf{e} \, \boldsymbol{\Lambda} \, \mathbf{e}^{\top} \, (\mathbf{e} \, \boldsymbol{\Lambda}^{-1/2}) = \\
& = \boldsymbol{\Lambda}^{-1/2} \boldsymbol{\Lambda} \, \boldsymbol{\Lambda}^{-1/2} = \textbf{J}_{n}
\end{aligned}
\] where \(\mathbf{J}_{n,n}\) reads as in Equation 31.2. Finally, substituting \(\tilde{\mathbf{X}} = \mathbf{T} \mathbf{X}\) in the OLS formula (Equation 15.3) and using the result Equation 16.4 one obtain exactly the GLS estimator in Equation 16.3, i.e. \[
\begin{aligned}
\mathbf{b}^{\tiny \text{GLS}} & {} = (\tilde{\mathbf{X}}^{\top} \tilde{\mathbf{X}})^{-1} \tilde{\mathbf{X}}^{\top} \tilde{\mathbf{y}} = \\
& = (\mathbf{X}^{\top} \mathbf{T}^{\top} \mathbf{T} \mathbf{X})^{-1} \mathbf{X}^{\top} \mathbf{T}^{\top} \mathbf{T} \mathbf{y} = \\
& = (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1} \mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{y}
\end{aligned}
\]
Properties GLS
Theorem 16.1 (\(\color{magenta}{\textbf{Aikten theorem}}\))
Under the following working hypothesis, also called Aikten hypothesis, i.e.
- \(\mathbf{y} = \mathbf{X} \mathbf{b} + \mathbf{u}\).
- \(\mathbb{E}\{\mathbf{u}\} = 0\).
- \(\mathbb{E}\{\mathbf{u} \mathbf{u}^{\top}\} = \boldsymbol{\Sigma}\), i.e. heteroskedastic and correlated errors.
- \(\mathbf{X}\) is non-stochastic and independent from the errors \(\mathbf{u}\) for all \(n\)’s.
The Generalized Least Square (GLS) estimator is \({\color{blue}{\textbf{BLUE}}}\) (Best Linear Unbiased Estimator), where “best” stands for the estimator with minimum variance in the class of linear unbiased estimators of \(\mathbf{b}\).
Proposition 16.3 (\(\color{magenta}{\textbf{Properties GLS estimator}}\))
1. Unbiased: \(\mathbf{b}^{\tiny\text{GLS}}\) is correct and it’s conditional expectation is equal to true parameter in population, i.e. \[
\mathbb{E}\{\mathbf{b}^{\tiny\text{GLS}}\mid \mathbf{X}\} = \mathbf{b}
\text{.}
\tag{16.5}\]
- Linear in the sense that it can be written as a linear combination of \(\mathbf{y}\) and \(\mathbf{X}\), i.e. \(\mathbf{b}^{\tiny\text{GLS}} = \mathbf{A}_{\text{x}} \mathbf{y}\), where \(\mathbf{A}_{\text{x}}\) do not depend on \(\mathbf{y}\), i.e. \[
\mathbf{b}^{\tiny\text{GLS}} = \mathbf{A}_{\text{x}} \mathbf{y} \quad \mathbf{A}_{\text{x}} = (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1}\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1}
\text{.}
\tag{16.6}\]
- Under the Aikten hypothesis (Theorem 16.1) it has minimum variance in the class of the unbiased linear estimators and it reads: \[
\mathbb{V}\{\mathbf{b}^{\tiny\text{GLS}}\mid \mathbf{X}\} = (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1}
\text{.}
\tag{16.7}\]
Proof. The GLS estimator is correct. It’s expected value is computed from Equation 16.3 and substituting Equation 14.11, is equal to the true parameter in population, i.e.
\[
\begin{aligned}
\mathbb{E}\{\mathbf{b}^{\tiny\text{GLS}}\mid \mathbf{X}\} {} & = \mathbb{E}\{(\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1}\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{y} \} = \\
& = \mathbb{E}\{(\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1}\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} (\mathbf{X} \mathbf{b} + \mathbf{u}) \} = \\
& = (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1}\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X} \mathbf{b} + (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1}\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbb{E}\{\mathbf{u}\mid \mathbf{X}\} = \\
& = \mathbf{b}
\end{aligned}
\tag{16.8}\]
Under the assumption of heteroskedastic and correlated observations the conditional variance of \(\mathbf{b}^{\tiny\text{GLS}}\) follows similarly as for the OLS case (Equation 15.12) but with \(\mathbb{V}\{\mathbf{u} \mid \mathbf{X} \} = \boldsymbol{\Sigma}\), i.e. \[
\begin{aligned}
\mathbb{V}\{\mathbf{b}^{\tiny\text{GLS}}\mid \mathbf{X}\} {} & = (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} {\color{red}{\mathbb{V}\{\mathbf{u}\mid \mathbf{X} \}}} \, \boldsymbol{\Sigma}^{-1} \mathbf{X} (\mathbf{X}^{\top} \mathbf{X})^{-1} = \\
& = (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1}\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} {\color{red}{\boldsymbol{\Sigma}}} \boldsymbol{\Sigma}^{-1} \mathbf{X} (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1} = \\
& = (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1}\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X} (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1} = \\
& = (\mathbf{X}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{X})^{-1}
\end{aligned}
\tag{16.9}\] where Equation 15.10 become a special case of Equation 16.9 where \(\boldsymbol{\Sigma} = \sigma_{\text{u}}^2 \textbf{I}_n\).
Gardini A., Costa M., Cavaliere G. 2007.
Econometria, Volume Primo. FrancoAngeli.
https://cris.unibo.it/handle/11585/119378.