17  Restricted least squares

References: Chapters 3.7, 3.8, 4.2 Gardini A. (2007).

Let’s consider a generic uni-variate linear model with \(k\)-regressors, namely \[ \mathbf{y} =b_1 \mathbf{X}_1 + \dots + b_j \mathbf{X}_j + \dots + b_k \mathbf{X}_k + \mathbf{u} = \mathbf{b} \mathbf{X} + \mathbf{u} \text{,} \] and suppose that we are interested in testing whereas the coefficient \(b_j\) is statistically different from a certain value \(r\) known at priori. In this case the null hypothesis can be equivalently represented using a more flexible matrix notation, i.e.  \[ \mathcal{H}_0: b_j = r \iff \mathcal{H}_0: \mathbf{R}^{\top} \mathbf{b} - \mathbf{r} = \mathbf{0} \text{,} \tag{17.1}\] where \[ \underset{k \times 1}{\mathbf{R}}^{\top} = \underset{j\text{-th position}}{\begin{pmatrix} 0 \, \dots \, 1 \, \dots \, 0 \end{pmatrix}} \text{.} \] Hence, the linear restriction in Equation 17.1 can be written in matrix as \[ \mathcal{H}_0: \underset{k \times 1}{\mathbf{R}}^{\top} \underset{k \times 1}{\mathbf{b}} - \underset{1 \times 1}{\mathbf{r}} = \underset{1 \times 1}{\mathbf{0}} \iff \underset{j\text{-th position}}{\begin{pmatrix} 0 \, \dots \, 1 \, \dots \, 0 \end{pmatrix}} \begin{pmatrix} b_1 \\ \vdots\\ b_j \\ \vdots \\ b_k \end{pmatrix} - \begin{pmatrix} r \end{pmatrix} = \begin{pmatrix} 0 \end{pmatrix} \text{.} \]

17.1 Multiple restrictions

Let’s consider multiple restrictions, i.e.  \[ \begin{aligned} \mathcal{H}_0: \quad & {} (1) \quad b_1 - b_2 = 0 && {} b_1 \text{ and } b_2 \text{ has same effect} \\ & (2) \quad b_3 + b_4 = 1 && b_3 \text{ plus } b_4 \text{ unitary root} \\ \end{aligned} \] Let’s construct the vector for (1) (first column of \(R\)) and (2) (second column of \(R\)), i.e. \[ \underset{2 \times 4 }{\mathbf{R}}^{\top} \underset{4 \times 1}{\mathbf{b}} - \underset{2 \times 1}{\mathbf{r}} = \underset{2 \times 1}{\mathbf{0}} \iff \underset{\mathbf{R}}{\underbrace{ \begin{pmatrix} 1 & -1 & 0 & 0 \\ 0 & 0 & 1 & 1 \\ \end{pmatrix}}} \underset{\mathbf{b}}{\underbrace{\begin{pmatrix} b_1 \\ b_2 \\ b_3 \\ b_4 \end{pmatrix}}} - \underset{\mathbf{r}}{\underbrace{\begin{pmatrix} 0 \\ 1 \end{pmatrix}}} = \begin{pmatrix} 0 \\ 0 \end{pmatrix} \text{.} \]

17.2 Restricted least squares

Proposition 17.1 (\(\color{magenta}{\textbf{Restricted Least Squares (RLS) estimator}}\))
Let’s consider a linear model under the OLS assumptions and let’s consider a set of \(m\) linear hypothesis on the parameters of the model taking the form \[ \mathcal{H}_0: \underset{m \times k}{\mathbf{R}}^{\top} \underset{k \times 1}{\mathbf{b}} - \underset{m \times 1}{\mathbf{r}} = \underset{m \times 1}{\mathbf{0}} \text{.} \] Therefore, the optimization problem became restricted to the space of parameters that satisfies the conditions. More precisely, the space \(\tilde{\Theta}_{\mathbf{b}}\), that is a subset of the parameter space \(\tilde{\Theta}_{\mathbf{b}} \subset \Theta_{\mathbf{b}}\) where the linear constraint holds true, is defined as \[ \tilde{\Theta}_{\mathbf{b}} = \left\{\mathbf{b} \in \mathbb{R}^k : \mathbf{R}^{\top} \mathbf{b} - \mathbf{r} = \mathbf{0} \right\} \text{.} \] Hence, the optimization problem in Equation 15.2 is restricted to only the parameters that satisfy the constraint.

Formally, the RLS estimator is the solution of the following minimization problem, i.e.  \[ \mathbf{b}^{\tiny\text{RLS}} = \underset{\mathbf{b} \in \tilde{\Theta}_{\mathbf{b}}}{\text{argmin}} \left\{\text{Q}^{\tiny\text{OLS}}(\mathbf{b})\right\} \text{.} \tag{17.2}\] where \(\text{Q}^{\tiny\text{OLS}}\) reads as in the OLS case (Equation 15.1). Notably, the analytic solution for \(\mathbf{b}^{\tiny\text{RLS}}\) reads \[ \mathbf{b}^{\tiny\text{RLS}} = \mathbf{b}^{\tiny\text{OLS}} - (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R} \left[\mathbf{R}^T(\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R}\right]^{-1} (\mathbf{R}^{\top} \mathbf{b}^{\tiny\text{OLS}} - \mathbf{r}) \text{.} \tag{17.3}\]

Proof. In order to solve the minimization problem in Equation 17.2, let’s construct the Lagrangian (Equation 31.11) as \[ \text{Q}^{\tiny\text{RLS}}(\mathbf{b}, \boldsymbol{\lambda}) = \text{Q}^{\tiny\text{OLS}}(\mathbf{b}) - {\color{red}{2}} \boldsymbol{\lambda}^{\top} (\mathbf{R}^{\top} \mathbf{b} - \mathbf{r}) \text{.} \] Then, one obtain the following system of equation, i.e.  \[ \begin{cases} \partial_{\mathbf{b}} \text{Q}^{\tiny\text{RLS}} (\mathbf{b}, \boldsymbol{\lambda}) = -2\mathbf{X}^{\top} \mathbf{y} + 2 \mathbf{X}^{\top} \mathbf{X} \mathbf{b} - {\color{red}{2}} \mathbf{R} \boldsymbol{\lambda} = \mathbf{0} & (A)\\ \partial_{\boldsymbol{\lambda}} \text{Q}^{\tiny\text{RLS}} (\mathbf{b}, \boldsymbol{\lambda}) = - {\color{red}{2}} (\mathbf{R}^{\top} \mathbf{b} - \mathbf{r}) = \mathbf{0} & (B)\\ \end{cases} \] Let’s firstly solve explicitly \(\mathbf{b} = \mathbf{b}^{\tiny\text{RLS}}\) from (A), i.e.  \[ \begin{aligned} \mathbf{b}^{\tiny\text{RLS}} & {} = (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{X}^{\top} \mathbf{y} - (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R} \boldsymbol{\lambda} = \\ & = \mathbf{b}^{\tiny\text{OLS}} - (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R} \boldsymbol{\lambda} \end{aligned} \tag{17.4}\] and substitute the result in (B), i.e.  \[ \begin{aligned} & {} \mathbf{R}^{\top} \mathbf{b}^{\tiny\text{RLS}} - \mathbf{r} = \mathbf{0} \\ & \implies \mathbf{R}^{\top} \left[ \mathbf{b}^{\tiny\text{OLS}} - (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R} \boldsymbol{\lambda} \right] - \mathbf{r} = \mathbf{0} \\ & \implies \mathbf{R}^{\top} \mathbf{b}^{\tiny\text{OLS}} - \mathbf{R}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R} \boldsymbol{\lambda} - \mathbf{r} = \mathbf{0} \\ & \implies \mathbf{R}^{\top} \mathbf{b}^{\tiny\text{OLS}} - \mathbf{r} = \left[\mathbf{R}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R}\right] \boldsymbol{\lambda} \end{aligned} \] Hence, it is possible to explicit the Lagrange multipliers \(\boldsymbol{\lambda}\) as: \[ \boldsymbol{\lambda} = \left[\mathbf{R}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R}\right]^{-1} (\mathbf{R}^{\top} \mathbf{b}^{\tiny\text{OLS}} - \mathbf{r}) \text{.} \tag{17.5}\] Finally, substituting \(\boldsymbol{\lambda}\) (Equation 17.5) in Equation 17.4 gives the optimal solution, i.e.  \[ \mathbf{b}^{\tiny\text{RLS}} = \mathbf{b}^{\tiny\text{OLS}} - (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R} \left[\mathbf{R}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R}\right]^{-1} (\mathbf{R}^{\top} \mathbf{b}^{\tiny\text{OLS}} - \mathbf{r}) \text{.} \] Note that if constraints hold true in the OLS estimate, \(\mathcal{H}_0\) is true and therefore \(\mathbf{R}^{\top} \mathbf{b}^{\tiny\text{OLS}} - \mathbf{r}= \mathbf{0}\). Hence the RLS and OLS parameters are the same, i.e. \(\mathbf{b}^{\tiny\text{RLS}} = \mathbf{b}^{\tiny\text{OLS}}\).

Proposition 17.2 (\(\color{magenta}{\textbf{Expectation RLS estimator}}\))
The RLS estimator (Equation 17.3) is correct for the true parameter in population \(\mathbf{b}\) if and only if the restrictions imposed by \(\mathcal{H}_0\) are true in population, i.e. expected value is computed as: \[ \mathbb{E}\{\mathbf{b}^{\tiny\text{RLS}} \mid \mathbf{X}\} = \mathbf{b} - (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R} \left[\mathbf{R}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R}\right]^{-1} (\mathbf{R}^{\top} \mathbf{b} - \mathbf{r}) \text{,} \tag{17.6}\] where \(\mathbb{E}\{\mathbf{b}^{\tiny\text{RLS}} \mid \mathbf{X}\} = \mathbf{b}\) only if the second component is zero, that happens only when \(\mathcal{H}_0\) holds true and so \(\mathbf{R}^{\top} \mathbf{b} - \mathbf{r} = \mathbf{0}\).

Proof. Let’s apply the expected value on Equation 17.3 remembering that \(\mathbf{X}\), \(\mathbf{R}\) and \(\mathbf{r}\) are non-stochastic and that \(\mathbf{b}^{\tiny\text{OLS}}\) is correct (Equation 15.8). Developing the computations gives: \[ \begin{aligned} \mathbb{E}\{\mathbf{b}^{\tiny\text{RLS}} \mid \mathbf{X}\} & {} = \mathbb{E}\{\mathbf{b}^{\tiny\text{OLS}} \mid \mathbf{X}\} - \mathbb{E}\left\{(\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R} \left[\mathbf{R}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R}\right]^{-1} (\mathbf{R}^{\top} \mathbf{b}^{\tiny\text{OLS}} - \mathbf{r}) \mid \mathbf{X}\right\} = \\ & = \mathbf{b} - (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R} \left[\mathbf{R}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R}\right]^{-1} (\mathbf{R}^{\top} \mathbb{E}\{\mathbf{b}^{\tiny\text{OLS}}\mid \mathbf{X}\} - \mathbf{r}) = \\ & = \mathbf{b} - (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R} \left[\mathbf{R}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R}\right]^{-1} (\mathbf{R}^{\top} \mathbf{b} - \mathbf{r}) \end{aligned} \] Hence \(\mathbf{b}^{\tiny\text{RLS}}\) is correct if and only if the restriction holds true in population, i.e. \[ \mathbb{E}\{\mathbf{b}^{\tiny\text{RLS}} \mid \mathbf{X}\} = \mathbf{b} \iff \mathbf{R}^{\top} \mathbf{b} - \mathbf{r} = \mathbf{0} \text{.} \]

Proposition 17.3 (\(\color{magenta}{\textbf{Variance RLS estimator}}\))
The variance of the RLS estimator (Equation 17.3) \[ \mathbb{V}\{\mathbf{b}^{\tiny\text{RLS}}\} = \mathbb{V}\{\mathbf{b}^{\tiny\text{OLS}}\} - \sigma_{\text{u}}^2 \cdot (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R} \left[\mathbf{R}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R}\right]^{-1} \mathbf{R}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1} \text{.} \] It is interesting to note that the variance of the RLS estimator is always lower or equal than the variance of the OLS estimator, in fact \[ \mathbb{V}\{\mathbf{b}^{\tiny\text{RLS}}\} \le \mathbb{V}\{\mathbf{b}^{\tiny\text{OLS}}\} \text{.} \]

Proof. In order to compute the variance of the RLS estimator, let’s apply the variance operator to @Equation 17.3, i.e.  \[ \mathbb{V}\{\mathbf{b}^{\tiny\text{RLS}}\} = \mathbb{V}\{\mathbf{b}^{\tiny\text{OLS}}\} - \mathbb{V}\left\{(\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R} \left[\mathbf{R}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R}\right]^{-1} \mathbf{R}^{\top} \mathbf{b}^{\tiny\text{OLS}} \right\} \text{.} \] Let’s denote with \(\mathbf{R}_{\mathbf{x}}\) the matrix \[ \mathbf{R}_{\mathbf{x}} = (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R} \left[\mathbf{R}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R}\right]^{-1} \mathbf{R}^{\top} \text{,} \] and let’s bring it outside the variance, i.e.  \[ \mathbb{V}\{\mathbf{b}^{\tiny\text{RLS}}\} = \mathbb{V}\{\mathbf{b}^{\tiny\text{OLS}}\} - \mathbf{R}_{\mathbf{x}} \mathbb{V}\{\mathbf{b}^{\tiny\text{OLS}}\} \mathbf{R}_{\mathbf{x}}^{\top} \text{.} \] Moreover, substituting the expression of the variance of \(\mathbf{b}^{\tiny\text{OLS}}\) (Equation 15.10) one obtain \[ \mathbb{V}\{\mathbf{b}^{\tiny\text{RLS}}\} = \mathbb{V}\{\mathbf{b}^{\tiny\text{OLS}}\} - \sigma_{\text{u}}^2 \cdot \mathbf{R}_{\mathbf{x}} (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R}_{\mathbf{x}}^{\top} \text{.} \] Developing the matrix multiplication gives \[ \begin{aligned} \mathbf{R}_{\mathbf{x}} (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R}_{\mathbf{x}}^{\top} {} & = (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R} \left[\mathbf{R}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R}\right]^{-1} \mathbf{R}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R} \left[\mathbf{R}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R}\right]^{-1} \mathbf{R}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1} = \\ & = (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R} \left[\mathbf{R}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{R}\right]^{-1} \mathbf{R}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1} \end{aligned} \]

17.3 A test for linear restrictions

Under the assumption of normality of the error terms, it is possible to derive a statistic to test the significance of the linear restrictions imposed by \(\mathbf{R}^{\top} \mathbf{b} - \mathbf{r} = \mathbf{0}\). Let’s test the validity of the hull hypothesis \(\mathcal{H}_0\) against its alternative hypothesis \(\mathcal{H}_1\), i.e.  \[ \mathcal{H}_0: \mathbf{R}^{\top} \mathbf{b} - \mathbf{r} = \mathbf{0} \text{,}\quad \mathcal{H}_1: \mathbf{R}^{\top} \mathbf{b} - \mathbf{r} \neq \mathbf{0} \text{.} \] Under normality, the OLS estimate are multivariate normal, thus applying the scaling property one obtain that the distribution under \(\mathcal{H}_0\) is normal, i.e.  \[ \mathbf{R}^{\top} \mathbf{b}^{\tiny \text{RLS}} - \mathbf{r} \sim \mathcal{N}(\mathbf{R}^{\top}\mathbf{b} - \mathbf{r}, \; \sigma_{\text{u}}^2 \cdot \mathbf{R}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1}\mathbf{R}) \text{.} \tag{17.7}\] Thus, we can write the statistic \[ \text{W}_m = \frac{1}{\sigma_{\text{u}}^2} \cdot (\mathbf{R}^{\top}\mathbf{b} - \mathbf{r})^{\top} (\mathbf{R}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1}\mathbf{R})^{-1} (\mathbf{R}^{\top}\mathbf{b} - \mathbf{r}) \text{.} \tag{17.8}\]

If we work under \(\mathcal{H}_0\), then the mean in Equation 17.7 is zero, i.e.
\[ \mathbf{R}^{\top} \mathbf{b}^{\tiny \text{RLS}} - \mathbf{r} \underset{\mathcal{H}_0}{\sim} \mathcal{N}(0, \sigma_{\text{u}}^2 \cdot \mathbf{R}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1}\mathbf{R}) \text{.} \] Recalling the relation (Section 32.1.1) between the distribution of the quadratic form of a multivariate normal and the \(\chi^2\) distribution, then the test statistic \[ \text{W}_m \overset{\text{d}}{\underset{\mathcal{H}_0}{\sim}} \chi^2(m) \text{,} \tag{17.9}\] has \(\chi^2(m)\) distribution, with \(m\) the number of restrictions.

Instead, under \(\mathcal{H}_1\) the distribution of the linear restriction is exactly equal to Equation 17.7. Thus, applying property 4. in Section 32.1.1 one obtain that the test statistic is distributed as a non central \(\chi^2(m, \delta)\), i.e. \[ \text{W}_m \overset{\text{d}}{\underset{\mathcal{H}_1}{\sim}} \chi^2(m, \delta) \text{,} \tag{17.10}\] where the non centrality parameter \(\delta\) reads \[ \delta = \frac{1}{\sigma_{\text{u}}^2} \cdot (\mathbf{R}^{\top}\mathbf{b} - \mathbf{r})^{\top} \left[\mathbf{R}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1}\mathbf{R}\right]^{-1} (\mathbf{R}^{\top}\mathbf{b} - \mathbf{r}) > 0 \text{.} \] As general decision rule \(\mathcal{H}_0\) is rejected if the statistic in Equation 17.9 is greater than the quantile with confidence level \(\alpha\) of a \(\chi^2(m)\) random variable. Such critic value, denoted with \(q_{\alpha}\) represents the value for which the probability that a \(\chi^2(m)\) is greater than \(q_{\alpha}\) is exactly equal to \(\alpha\), i.e.  \[ \mathbb{P}(\text{W}_m > q_{\alpha}) = \alpha \text{.} \]