17 Restricted least squares

References: Chapters 3.7, 3.8, 4.2 Gardini A. (2007).

Let’s consider a generic uni-variate linear model with $k$ -regressors, namely $y = b_{1} X_{1} + \dots + b_{j} X_{j} + \dots + b_{k} X_{k} + u = b X + u,$ and suppose that we are interested in testing whereas the coefficient $b_{j}$ is statistically different from a certain value $r$ known at priori. In this case the null hypothesis can be equivalently represented using a more flexible matrix notation, i.e. $\begin{matrix} (17.1) & H_{0} : b_{j} = r ⟺ H_{0} : R^{⊤} b - r = 0, \end{matrix}$ where ${\underset{k \times 1}{R}}^{⊤} = \underset{j -th position}{(\begin{matrix} 0 \dots 1 \dots 0 \end{matrix})} .$ Hence, the linear restriction in Equation 17.1 can be written in matrix as $H_{0} : {\underset{k \times 1}{R}}^{⊤} \underset{k \times 1}{b} - \underset{1 \times 1}{r} = \underset{1 \times 1}{0} ⟺ \underset{j -th position}{(\begin{matrix} 0 \dots 1 \dots 0 \end{matrix})} (\begin{matrix} b_{1} \\ ⋮ \\ b_{j} \\ ⋮ \\ b_{k} \end{matrix}) - (\begin{matrix} r \end{matrix}) = (\begin{matrix} 0 \end{matrix}) .$

17.1 Multiple restrictions

Let’s consider multiple restrictions, i.e. $\begin{aligned} H_{0} : & (1) b_{1} - b_{2} = 0 & b_{1} and b_{2} has same effect \\ (2) b_{3} + b_{4} = 1 & b_{3} plus b_{4} unitary root \end{aligned}$ Let’s construct the vector for (1) (first column of $R$ ) and (2) (second column of $R$ ), i.e. ${\underset{2 \times 4}{R}}^{⊤} \underset{4 \times 1}{b} - \underset{2 \times 1}{r} = \underset{2 \times 1}{0} ⟺ \underset{R}{\underset{⏟}{(\begin{matrix} 1 & - 1 & 0 & 0 \\ 0 & 0 & 1 & 1 \end{matrix})}} \underset{b}{\underset{⏟}{(\begin{matrix} b_{1} \\ b_{2} \\ b_{3} \\ b_{4} \end{matrix})}} - \underset{r}{\underset{⏟}{(\begin{matrix} 0 \\ 1 \end{matrix})}} = (\begin{matrix} 0 \\ 0 \end{matrix}) .$

17.2 Restricted least squares

Proposition 17.1 ( $Restricted Least Squares (RLS) estimator$ )
Let’s consider a linear model under the OLS assumptions and let’s consider a set of $m$ linear hypothesis on the parameters of the model taking the form $H_{0} : {\underset{m \times k}{R}}^{⊤} \underset{k \times 1}{b} - \underset{m \times 1}{r} = \underset{m \times 1}{0} .$ Therefore, the optimization problem became restricted to the space of parameters that satisfies the conditions. More precisely, the space ${\tilde{Θ}}_{b}$ , that is a subset of the parameter space ${\tilde{Θ}}_{b} \subset Θ_{b}$ where the linear constraint holds true, is defined as ${\tilde{Θ}}_{b} = {b \in R^{k} : R^{⊤} b - r = 0} .$ Hence, the optimization problem in Equation 15.2 is restricted to only the parameters that satisfy the constraint.

Formally, the RLS estimator is the solution of the following minimization problem, i.e. $\begin{matrix} (17.2) & b^{RLS} = \underset{b \in {\tilde{Θ}}_{b}}{argmin} {Q^{OLS} (b)} . \end{matrix}$ where $Q^{OLS}$ reads as in the OLS case (Equation 15.1). Notably, the analytic solution for $b^{RLS}$ reads $\begin{matrix} (17.3) & b^{RLS} = b^{OLS} - (X^{⊤} X)^{- 1} R {[R^{T} (X^{⊤} X)^{- 1} R]}^{- 1} (R^{⊤} b^{OLS} - r) . \end{matrix}$

Proof: Proposition 17.1

Proof. In order to solve the minimization problem in Equation 17.2, let’s construct the Lagrangian (Equation 31.11) as $Q^{RLS} (b, λ) = Q^{OLS} (b) - 2 λ^{⊤} (R^{⊤} b - r) .$ Then, one obtain the following system of equation, i.e. ${\begin{cases} \partial_{b} Q^{RLS} (b, λ) = - 2 X^{⊤} y + 2 X^{⊤} X b - 2 R λ = 0 & (A) \\ \partial_{λ} Q^{RLS} (b, λ) = - 2 (R^{⊤} b - r) = 0 & (B) \end{cases}$ Let’s firstly solve explicitly $b = b^{RLS}$ from (A), i.e. $\begin{matrix} (17.4) & \begin{aligned} b^{RLS} & = (X^{⊤} X)^{- 1} X^{⊤} y - (X^{⊤} X)^{- 1} R λ = \\ = b^{OLS} - (X^{⊤} X)^{- 1} R λ \end{aligned} \end{matrix}$ and substitute the result in (B), i.e. $\begin{aligned} R^{⊤} b^{RLS} - r = 0 \\ ⟹ R^{⊤} [b^{OLS} - (X^{⊤} X)^{- 1} R λ] - r = 0 \\ ⟹ R^{⊤} b^{OLS} - R^{⊤} (X^{⊤} X)^{- 1} R λ - r = 0 \\ ⟹ R^{⊤} b^{OLS} - r = [R^{⊤} (X^{⊤} X)^{- 1} R] λ \end{aligned}$ Hence, it is possible to explicit the Lagrange multipliers $λ$ as: $\begin{matrix} (17.5) & λ = {[R^{⊤} (X^{⊤} X)^{- 1} R]}^{- 1} (R^{⊤} b^{OLS} - r) . \end{matrix}$ Finally, substituting $λ$ (Equation 17.5) in Equation 17.4 gives the optimal solution, i.e. $b^{RLS} = b^{OLS} - (X^{⊤} X)^{- 1} R {[R^{⊤} (X^{⊤} X)^{- 1} R]}^{- 1} (R^{⊤} b^{OLS} - r) .$ Note that if constraints hold true in the OLS estimate, $H_{0}$ is true and therefore $R^{⊤} b^{OLS} - r = 0$ . Hence the RLS and OLS parameters are the same, i.e. $b^{RLS} = b^{OLS}$ .

Proposition 17.2 ( $Expectation RLS estimator$ )
The RLS estimator (Equation 17.3) is correct for the true parameter in population $b$ if and only if the restrictions imposed by $H_{0}$ are true in population, i.e. expected value is computed as: $\begin{matrix} (17.6) & E {b^{RLS} ∣ X} = b - (X^{⊤} X)^{- 1} R {[R^{⊤} (X^{⊤} X)^{- 1} R]}^{- 1} (R^{⊤} b - r), \end{matrix}$ where $E {b^{RLS} ∣ X} = b$ only if the second component is zero, that happens only when $H_{0}$ holds true and so $R^{⊤} b - r = 0$ .

Proof: Proposition 17.2

Proof. Let’s apply the expected value on Equation 17.3 remembering that $X$ , $R$ and $r$ are non-stochastic and that $b^{OLS}$ is correct (Equation 15.8). Developing the computations gives: $\begin{aligned} E {b^{RLS} ∣ X} & = E {b^{OLS} ∣ X} - E {(X^{⊤} X)^{- 1} R {[R^{⊤} (X^{⊤} X)^{- 1} R]}^{- 1} (R^{⊤} b^{OLS} - r) ∣ X} = \\ = b - (X^{⊤} X)^{- 1} R {[R^{⊤} (X^{⊤} X)^{- 1} R]}^{- 1} (R^{⊤} E {b^{OLS} ∣ X} - r) = \\ = b - (X^{⊤} X)^{- 1} R {[R^{⊤} (X^{⊤} X)^{- 1} R]}^{- 1} (R^{⊤} b - r) \end{aligned}$ Hence $b^{RLS}$ is correct if and only if the restriction holds true in population, i.e. $E {b^{RLS} ∣ X} = b ⟺ R^{⊤} b - r = 0 .$

Proposition 17.3 ( $Variance RLS estimator$ )
The variance of the RLS estimator (Equation 17.3) $V {b^{RLS}} = V {b^{OLS}} - σ_{u}^{2} \cdot (X^{⊤} X)^{- 1} R {[R^{⊤} (X^{⊤} X)^{- 1} R]}^{- 1} R^{⊤} (X^{⊤} X)^{- 1} .$ It is interesting to note that the variance of the RLS estimator is always lower or equal than the variance of the OLS estimator, in fact $V {b^{RLS}} \leq V {b^{OLS}} .$

Proof: Proposition 17.3

Proof. In order to compute the variance of the RLS estimator, let’s apply the variance operator to @Equation 17.3, i.e. $V {b^{RLS}} = V {b^{OLS}} - V {(X^{⊤} X)^{- 1} R {[R^{⊤} (X^{⊤} X)^{- 1} R]}^{- 1} R^{⊤} b^{OLS}} .$ Let’s denote with $R_{x}$ the matrix $R_{x} = (X^{⊤} X)^{- 1} R {[R^{⊤} (X^{⊤} X)^{- 1} R]}^{- 1} R^{⊤},$ and let’s bring it outside the variance, i.e. $V {b^{RLS}} = V {b^{OLS}} - R_{x} V {b^{OLS}} R_{x}^{⊤} .$ Moreover, substituting the expression of the variance of $b^{OLS}$ (Equation 15.10) one obtain $V {b^{RLS}} = V {b^{OLS}} - σ_{u}^{2} \cdot R_{x} (X^{⊤} X)^{- 1} R_{x}^{⊤} .$ Developing the matrix multiplication gives $\begin{aligned} R_{x} (X^{⊤} X)^{- 1} R_{x}^{⊤} & = (X^{⊤} X)^{- 1} R {[R^{⊤} (X^{⊤} X)^{- 1} R]}^{- 1} R^{⊤} (X^{⊤} X)^{- 1} R {[R^{⊤} (X^{⊤} X)^{- 1} R]}^{- 1} R^{⊤} (X^{⊤} X)^{- 1} = \\ = (X^{⊤} X)^{- 1} R {[R^{⊤} (X^{⊤} X)^{- 1} R]}^{- 1} R^{⊤} (X^{⊤} X)^{- 1} \end{aligned}$

17.3 A test for linear restrictions

Under the assumption of normality of the error terms, it is possible to derive a statistic to test the significance of the linear restrictions imposed by $R^{⊤} b - r = 0$ . Let’s test the validity of the hull hypothesis $H_{0}$ against its alternative hypothesis $H_{1}$ , i.e. $H_{0} : R^{⊤} b - r = 0, H_{1} : R^{⊤} b - r \neq 0 .$ Under normality, the OLS estimate are multivariate normal, thus applying the scaling property one obtain that the distribution under $H_{0}$ is normal, i.e. $\begin{matrix} (17.7) & R^{⊤} b^{RLS} - r \sim N (R^{⊤} b - r, σ_{u}^{2} \cdot R^{⊤} (X^{⊤} X)^{- 1} R) . \end{matrix}$ Thus, we can write the statistic $\begin{matrix} (17.8) & W_{m} = \frac{1}{σ_{u}^{2}} \cdot (R^{⊤} b - r)^{⊤} (R^{⊤} (X^{⊤} X)^{- 1} R)^{- 1} (R^{⊤} b - r) . \end{matrix}$

If we work under $H_{0}$ , then the mean in Equation 17.7 is zero, i.e.
$R^{⊤} b^{RLS} - r \underset{H_{0}}{\sim} N (0, σ_{u}^{2} \cdot R^{⊤} (X^{⊤} X)^{- 1} R) .$ Recalling the relation (Section 32.1.1) between the distribution of the quadratic form of a multivariate normal and the $χ^{2}$ distribution, then the test statistic $\begin{matrix} (17.9) & W_{m} \overset{d}{\underset{H_{0}}{\sim}} χ^{2} (m), \end{matrix}$ has $χ^{2} (m)$ distribution, with $m$ the number of restrictions.

Instead, under $H_{1}$ the distribution of the linear restriction is exactly equal to Equation 17.7. Thus, applying property 4. in Section 32.1.1 one obtain that the test statistic is distributed as a non central $χ^{2} (m, δ)$ , i.e. $\begin{matrix} (17.10) & W_{m} \overset{d}{\underset{H_{1}}{\sim}} χ^{2} (m, δ), \end{matrix}$ where the non centrality parameter $δ$ reads $δ = \frac{1}{σ_{u}^{2}} \cdot (R^{⊤} b - r)^{⊤} {[R^{⊤} (X^{⊤} X)^{- 1} R]}^{- 1} (R^{⊤} b - r) > 0 .$ As general decision rule $H_{0}$ is rejected if the statistic in Equation 17.9 is greater than the quantile with confidence level $α$ of a $χ^{2} (m)$ random variable. Such critic value, denoted with $q_{α}$ represents the value for which the probability that a $χ^{2} (m)$ is greater than $q_{α}$ is exactly equal to $α$ , i.e. $P (W_{m} > q_{α}) = α .$