17 Restricted least squares
References: Chapters 3.7, 3.8, 4.2 Gardini A. (2007).
Let’s consider a generic uni-variate linear model with \(k\)-regressors, namely \[ \mathbf{y} =b_1 \mathbf{X}_1 + \dots + b_j \mathbf{X}_j + \dots + b_k \mathbf{X}_k + \mathbf{u} = \mathbf{b} \mathbf{X} + \mathbf{u} \text{,} \] and suppose that we are interested in testing whereas the coefficient \(b_j\) is statistically different from a certain value \(r\) known at priori. In this case the null hypothesis can be equivalently represented using a more flexible matrix notation, i.e. \[ \mathcal{H}_0: b_j = r \iff \mathcal{H}_0: \mathbf{R}^{\top} \mathbf{b} - \mathbf{r} = \mathbf{0} \text{,} \tag{17.1}\] where \[ \underset{k \times 1}{\mathbf{R}}^{\top} = \underset{j\text{-th position}}{\begin{pmatrix} 0 \, \dots \, 1 \, \dots \, 0 \end{pmatrix}} \text{.} \] Hence, the linear restriction in Equation 17.1 can be written in matrix as \[ \mathcal{H}_0: \underset{k \times 1}{\mathbf{R}}^{\top} \underset{k \times 1}{\mathbf{b}} - \underset{1 \times 1}{\mathbf{r}} = \underset{1 \times 1}{\mathbf{0}} \iff \underset{j\text{-th position}}{\begin{pmatrix} 0 \, \dots \, 1 \, \dots \, 0 \end{pmatrix}} \begin{pmatrix} b_1 \\ \vdots\\ b_j \\ \vdots \\ b_k \end{pmatrix} - \begin{pmatrix} r \end{pmatrix} = \begin{pmatrix} 0 \end{pmatrix} \text{.} \]
17.1 Multiple restrictions
Let’s consider multiple restrictions, i.e. \[ \begin{aligned} \mathcal{H}_0: \quad & {} (1) \quad b_1 - b_2 = 0 && {} b_1 \text{ and } b_2 \text{ has same effect} \\ & (2) \quad b_3 + b_4 = 1 && b_3 \text{ plus } b_4 \text{ unitary root} \\ \end{aligned} \] Let’s construct the vector for (1) (first column of \(R\)) and (2) (second column of \(R\)), i.e. \[ \underset{2 \times 4 }{\mathbf{R}}^{\top} \underset{4 \times 1}{\mathbf{b}} - \underset{2 \times 1}{\mathbf{r}} = \underset{2 \times 1}{\mathbf{0}} \iff \underset{\mathbf{R}}{\underbrace{ \begin{pmatrix} 1 & -1 & 0 & 0 \\ 0 & 0 & 1 & 1 \\ \end{pmatrix}}} \underset{\mathbf{b}}{\underbrace{\begin{pmatrix} b_1 \\ b_2 \\ b_3 \\ b_4 \end{pmatrix}}} - \underset{\mathbf{r}}{\underbrace{\begin{pmatrix} 0 \\ 1 \end{pmatrix}}} = \begin{pmatrix} 0 \\ 0 \end{pmatrix} \text{.} \]
17.2 Restricted least squares
Proposition 17.1 (\(\color{magenta}{\textbf{Restricted Least Squares (RLS) estimator}}\))
Let’s consider a linear model under the OLS assumptions and let’s consider a set of \(m\) linear hypothesis on the parameters of the model taking the form \[
\mathcal{H}_0: \underset{m \times k}{\mathbf{R}}^{\top} \underset{k \times 1}{\mathbf{b}} - \underset{m \times 1}{\mathbf{r}} = \underset{m \times 1}{\mathbf{0}}
\text{.}
\] Therefore, the optimization problem became restricted to the space of parameters that satisfies the conditions. More precisely, the space \(\tilde{\Theta}_{\mathbf{b}}\), that is a subset of the parameter space \(\tilde{\Theta}_{\mathbf{b}} \subset \Theta_{\mathbf{b}}\) where the linear constraint holds true, is defined as \[
\tilde{\Theta}_{\mathbf{b}} = \left\{\mathbf{b} \in \mathbb{R}^k : \mathbf{R}^{\top} \mathbf{b} - \mathbf{r} = \mathbf{0} \right\}
\text{.}
\] Hence, the optimization problem in Equation 15.2 is restricted to only the parameters that satisfy the constraint.
Formally, the RLS estimator is the solution of the following minimization problem, i.e. \[ \mathbf{b}^{\tiny\text{RLS}} = \underset{\mathbf{b} \in \tilde{\Theta}_{\mathbf{b}}}{\text{argmin}} \left\{\text{Q}^{\tiny\text{OLS}}(\mathbf{b})\right\} \text{.} \tag{17.2}\] where \(\text{Q}^{\tiny\text{OLS}}\) reads as in the OLS case (Equation 15.1). Notably, the analytic solution for \(\mathbf{b}^{\tiny\text{RLS}}\) reads \[ \mathbf{b}^{\tiny\text{RLS}} = \mathbf{b}^{\tiny\text{OLS}} - (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R} \left[\mathbf{R}^T(\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R}\right]^{-1} (\mathbf{R}^{\top} \mathbf{b}^{\tiny\text{OLS}} - \mathbf{r}) \text{.} \tag{17.3}\]
Proposition 17.2 (\(\color{magenta}{\textbf{Expectation RLS estimator}}\))
The RLS estimator (Equation 17.3) is correct for the true parameter in population \(\mathbf{b}\) if and only if the restrictions imposed by \(\mathcal{H}_0\) are true in population, i.e. expected value is computed as: \[
\mathbb{E}\{\mathbf{b}^{\tiny\text{RLS}} \mid \mathbf{X}\} = \mathbf{b} - (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R} \left[\mathbf{R}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R}\right]^{-1} (\mathbf{R}^{\top} \mathbf{b} - \mathbf{r})
\text{,}
\tag{17.6}\] where \(\mathbb{E}\{\mathbf{b}^{\tiny\text{RLS}} \mid \mathbf{X}\} = \mathbf{b}\) only if the second component is zero, that happens only when \(\mathcal{H}_0\) holds true and so \(\mathbf{R}^{\top} \mathbf{b} - \mathbf{r} = \mathbf{0}\).
Proposition 17.3 (\(\color{magenta}{\textbf{Variance RLS estimator}}\))
The variance of the RLS estimator (Equation 17.3) \[
\mathbb{V}\{\mathbf{b}^{\tiny\text{RLS}}\} = \mathbb{V}\{\mathbf{b}^{\tiny\text{OLS}}\} -
\sigma_{\text{u}}^2 \cdot
(\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R} \left[\mathbf{R}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R}\right]^{-1}
\mathbf{R}^{\top}
(\mathbf{X}^{\top} \mathbf{X})^{-1}
\text{.}
\] It is interesting to note that the variance of the RLS estimator is always lower or equal than the variance of the OLS estimator, in fact \[
\mathbb{V}\{\mathbf{b}^{\tiny\text{RLS}}\} \le \mathbb{V}\{\mathbf{b}^{\tiny\text{OLS}}\}
\text{.}
\]
17.3 A test for linear restrictions
Under the assumption of normality of the error terms, it is possible to derive a statistic to test the significance of the linear restrictions imposed by \(\mathbf{R}^{\top} \mathbf{b} - \mathbf{r} = \mathbf{0}\). Let’s test the validity of the hull hypothesis \(\mathcal{H}_0\) against its alternative hypothesis \(\mathcal{H}_1\), i.e. \[ \mathcal{H}_0: \mathbf{R}^{\top} \mathbf{b} - \mathbf{r} = \mathbf{0} \text{,}\quad \mathcal{H}_1: \mathbf{R}^{\top} \mathbf{b} - \mathbf{r} \neq \mathbf{0} \text{.} \] Under normality, the OLS estimate are multivariate normal, thus applying the scaling property one obtain that the distribution under \(\mathcal{H}_0\) is normal, i.e. \[ \mathbf{R}^{\top} \mathbf{b}^{\tiny \text{RLS}} - \mathbf{r} \sim \mathcal{N}(\mathbf{R}^{\top}\mathbf{b} - \mathbf{r}, \; \sigma_{\text{u}}^2 \cdot \mathbf{R}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1}\mathbf{R}) \text{.} \tag{17.7}\] Thus, we can write the statistic \[ \text{W}_m = \frac{1}{\sigma_{\text{u}}^2} \cdot (\mathbf{R}^{\top}\mathbf{b} - \mathbf{r})^{\top} (\mathbf{R}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1}\mathbf{R})^{-1} (\mathbf{R}^{\top}\mathbf{b} - \mathbf{r}) \text{.} \tag{17.8}\]
If we work under \(\mathcal{H}_0\), then the mean in Equation 17.7 is zero, i.e.
\[
\mathbf{R}^{\top} \mathbf{b}^{\tiny \text{RLS}} - \mathbf{r}
\underset{\mathcal{H}_0}{\sim}
\mathcal{N}(0, \sigma_{\text{u}}^2 \cdot \mathbf{R}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1}\mathbf{R})
\text{.}
\] Recalling the relation (Section 32.1.1) between the distribution of the quadratic form of a multivariate normal and the \(\chi^2\) distribution, then the test statistic \[
\text{W}_m
\overset{\text{d}}{\underset{\mathcal{H}_0}{\sim}}
\chi^2(m)
\text{,}
\tag{17.9}\] has \(\chi^2(m)\) distribution, with \(m\) the number of restrictions.
Instead, under \(\mathcal{H}_1\) the distribution of the linear restriction is exactly equal to Equation 17.7. Thus, applying property 4. in Section 32.1.1 one obtain that the test statistic is distributed as a non central \(\chi^2(m, \delta)\), i.e. \[ \text{W}_m \overset{\text{d}}{\underset{\mathcal{H}_1}{\sim}} \chi^2(m, \delta) \text{,} \tag{17.10}\] where the non centrality parameter \(\delta\) reads \[ \delta = \frac{1}{\sigma_{\text{u}}^2} \cdot (\mathbf{R}^{\top}\mathbf{b} - \mathbf{r})^{\top} \left[\mathbf{R}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1}\mathbf{R}\right]^{-1} (\mathbf{R}^{\top}\mathbf{b} - \mathbf{r}) > 0 \text{.} \] As general decision rule \(\mathcal{H}_0\) is rejected if the statistic in Equation 17.9 is greater than the quantile with confidence level \(\alpha\) of a \(\chi^2(m)\) random variable. Such critic value, denoted with \(q_{\alpha}\) represents the value for which the probability that a \(\chi^2(m)\) is greater than \(q_{\alpha}\) is exactly equal to \(\alpha\), i.e. \[ \mathbb{P}(\text{W}_m > q_{\alpha}) = \alpha \text{.} \]