17 Restricted least squares

References: Chapter 4. Gardini A. (2000).

17.1 A general framework for linear restrictions

Let’s consider a generic uni-variate linear model with $k$ -regressors, namely $y = b_{1} X_{1} + \dots + b_{j} X_{j} + \dots + b_{k} X_{k} + e = b X + e,$ and suppose that we are interest in testing if the $b_{j}$ coefficient is statistically different from a certain known value $r$ . In this case the null hypothesis, that is $H_{0} : b_{j} = r$ , can be equivalently represented using a more flexible matrix notation, i.e. $H_{0} : b_{j} = r ⟺ H_{0} : R^{⊤} b - r = 0,$ where ${\underset{k \times 1}{R}}^{⊤} = \underset{j -th position}{(\begin{matrix} 0 \dots 1 \dots 0 \end{matrix})} .$ Hence, the linear restriction in matrix form reads explicitly as $H_{0} : {\underset{k \times 1}{R}}^{⊤} \underset{k \times 1}{b} - \underset{1 \times 1}{r} = \underset{1 \times 1}{0} ⟺ \underset{j -th position}{(\begin{matrix} 0 \dots 1 \dots 0 \end{matrix})} (\begin{matrix} b_{1} \\ ⋮ \\ b_{j} \\ ⋮ \\ b_{k} \end{matrix}) - (\begin{matrix} r \end{matrix}) = (\begin{matrix} 0 \end{matrix}) .$

17.2 Multiple restrictions

Let’s consider a linear model of the form $y = b_{1} X_{1} + b_{2} X_{2} + b_{3} X_{3} + b_{4} X_{4},$ and suppose that the aim is to test at the same time the following null hypothesis, i.e. $\begin{aligned} H_{0} : & (1) b_{1} - b_{2} = 0 & b_{1} and b_{2} has same effect \\ (2) b_{3} + b_{4} = 1 & b_{3} plus b_{4} unitary root \end{aligned}$ Let’s construct the vector for (1) (first column of $R$ ) and (2) (first column of $R$ ), i.e. ${\underset{2 \times 4}{R}}^{⊤} \underset{4 \times 1}{b} - \underset{2 \times 1}{r} = \underset{2 \times 1}{0} ⟺ (\begin{matrix} 1 & - 1 & 0 & 0 \\ 0 & 0 & 1 & 1 \end{matrix}) (\begin{matrix} b_{1} \\ b_{2} \\ b_{3} \\ b_{4} \end{matrix}) - (\begin{matrix} 0 \\ 1 \end{matrix}) = (\begin{matrix} 0 \\ 0 \end{matrix}) .$

17.3 Restricted least squares

Proposition 17.1 Let’s consider a linear model under the OLS assumptions and let’s consider a set of $m$ linear hypothesis on the parameters of the model taking the form $H_{0} : {\underset{m \times k}{R}}^{⊤} \underset{k \times 1}{b} - \underset{m \times 1}{r} = \underset{m \times 1}{0} .$ Therefore, the optimization problem became restricted to the space of parameters that satisfies the conditions. More precisely, such parameters will be in a subset of the parameter space, i.e. ${\tilde{Θ}}_{b} \subset Θ_{b}$ , where the linear constraint holds true. Formally, the space ${\tilde{Θ}}_{b}$ is defined as ${\tilde{Θ}}_{b} = {b \in R^{k} : R^{⊤} b - r = 0} .$ Hence, the optimization problem in Equation 15.2 is restricted to only the parameters that satisfy the constraint. Formally, the RLS estimator is the solution of the following minimization problem, i.e. $\begin{matrix} (17.1) & b^{RLS} = \underset{b \in {\tilde{Θ}}_{b}}{argmin} {Q^{OLS} (b)} . \end{matrix}$ where $Q^{OLS}$ reads as in the OLS case (Equation 15.1). Notably, the analytic solution for $b^{RLS}$ reads $\begin{matrix} (17.2) & b^{RLS} = b^{OLS} - (X^{⊤} X)^{- 1} R {[R^{T} (X^{⊤} X)^{- 1} R]}^{- 1} (R^{⊤} b^{OLS} - r) . \end{matrix}$

Proof: Proposition 17.1

Proof. In order to solve the minimization problem in Equation 17.1, let’s construct the Lagrangian $L (x, λ)$ , i.e. $L (x, λ) = f (x) - λ^{⊤} g (x),$ where $λ$ is the vector of the Lagrange multipliers. Minimizing $L (x, λ)$ is equivalent to find the value of $x$ that minimize $f (x)$ under the constraint $g (x) = 0$ . In fact, it is possible to prove that the minimum is found as: $\begin{matrix} (17.3) & \underset{x \in χ}{argmin} L (x, λ) ⟹ {\begin{cases} (A) \partial_{x} L (x, λ) = 0 \\ (B) \partial_{λ} L (x, λ) = 0 \end{cases} \end{matrix}$ In the case of RLS estimate the Lagrangian reads: $L (b, λ) = Q^{OLS} (b) - 2 λ^{⊤} (R^{⊤} b - r) .$ Then, from Equation 17.3 one obtain the following system of equation, i.e. ${\begin{cases} (A) \partial_{b} L (b, λ) = - 2 X^{⊤} y + 2 X^{⊤} X b - 2 R λ = 0 \\ (B) \partial_{λ} L (b, λ) = - 2 (R^{⊤} b - r) = 0 \end{cases}$ Let’s explicit $b = b^{RLS}$ from (A), i.e. $\begin{matrix} (17.4) & \begin{aligned} b^{RLS} & = (X^{⊤} X)^{- 1} X^{⊤} y - (X^{⊤} X)^{- 1} R λ = \\ = b^{OLS} - (X^{⊤} X)^{- 1} R λ \end{aligned} \end{matrix}$ Let’s now substitute Equation 17.4 in (B), i.e. $\begin{aligned} R^{⊤} b^{RLS} - r = 0 \\ ⟹ R^{⊤} [b^{OLS} - (X^{⊤} X)^{- 1} R λ] - r = 0 \\ ⟹ R^{⊤} b^{OLS} - R^{⊤} (X^{⊤} X)^{- 1} R λ - r = 0 \\ ⟹ R^{⊤} b^{OLS} - r = [R^{⊤} (X^{⊤} X)^{- 1} R] λ \end{aligned}$ Hence, it is possible to explicit the Lagrange multipliers $λ$ as: $\begin{matrix} (17.5) & λ = {[R^{⊤} (X^{⊤} X)^{- 1} R]}^{- 1} (R^{⊤} b^{OLS} - r) . \end{matrix}$ Finally, substituting Equation 17.5 in Equation 17.4 gives the optimal solution, i.e. $\begin{aligned} b^{RLS} & = b^{OLS} - (X^{⊤} X)^{- 1} R λ = \\ = b^{OLS} - (X^{⊤} X)^{- 1} R {[R^{⊤} (X^{⊤} X)^{- 1} R]}^{- 1} (R^{⊤} b^{OLS} - r) \end{aligned}$ Note that if constraints hold true in the OLS estimate, $H_{0}$ is true and therefore $R^{⊤} b^{OLS} - r = 0$ . Hence the RLS and OLS parameters are the same, i.e. $b^{RLS} = b^{OLS}$ .

17.4 Properties RLS

The RLS estimator is correct if and only if the restriction imposed by $H_{0}$ is true in population. In fact, it’s expected value is computed as: $\begin{matrix} (17.6) & E {b^{RLS} ∣ X} = b - (X^{⊤} X)^{- 1} R {[R^{⊤} (X^{⊤} X)^{- 1} R]}^{- 1} (R^{⊤} b - r), \end{matrix}$ and it is correct if and only if the second component is zero, i.e. if $H_{0}$ holds true.

Correctness of RLS estimator

Proof. Let’s apply the expected value on Equation 17.2 remembering that $X$ , $R$ and $r$ are non-stochastic and that $b^{OLS}$ is correct (Equation 15.8). Developing the computations gives: $\begin{aligned} E {b^{RLS} ∣ X} & = E {b^{OLS} ∣ X} - E {(X^{⊤} X)^{- 1} R {[R^{⊤} (X^{⊤} X)^{- 1} R]}^{- 1} (R^{⊤} b^{OLS} - r) ∣ X} = \\ = b - (X^{⊤} X)^{- 1} R {[R^{⊤} (X^{⊤} X)^{- 1} R]}^{- 1} (R^{⊤} E {b^{OLS} ∣ X} - r) = \\ = b - (X^{⊤} X)^{- 1} R {[R^{⊤} (X^{⊤} X)^{- 1} R]}^{- 1} (R^{⊤} b - r) \end{aligned}$ Hence $b^{RLS}$ is correct if and only if the restriction holds in population. $E {b^{RLS} ∣ X} = b ⟺ R^{⊤} b - r = 0 .$

17.5 A test for linear restrictions

Under the assumption of normality of the error terms, it is possible to derive a statistic to test the significance of the linear restrictions imposed by $R^{⊤} b - r = 0$ . Let’s test the validity of the hull hypothesis $H_{0}$ against the alternative hypothesis $H_{1}$ , i.e. $H_{0} : R^{⊤} b^{OLS} - r = 0, H_{1} : R^{⊤} b^{OLS} - r \neq 0 .$ Under normality, the OLS estimate are multivariate normal, thus applying the scaling property one obtain that the distribution under $H_{0}$ is normal, i.e. $R^{⊤} b^{OLS} - r \sim N (R^{⊤} b - r, σ_{e}^{2} R^{⊤} (X^{⊤} X)^{- 1} R) .$ Applying the relation between the distribution of the quadratic form of a multivariate normal and the $χ^{2}$ distribution from property 3 in Section 35.1.2, one obtain the statistic: $\begin{matrix} (17.7) & T_{m} = (R^{⊤} b - r)^{⊤} (σ_{e}^{2} R^{⊤} (X^{⊤} X)^{- 1} R)^{- 1} (R^{⊤} b - r) . \end{matrix}$ Under $H_{0}$ , $T_{m}$ is distributed as a $χ^{2} (m)$ , where $m$ is the number of linear restrictions, i.e. $\begin{matrix} (17.8) & T_{m} \overset{H_{0}}{\sim} χ^{2} (m) . \end{matrix}$

As general decision rule $H_{0}$ is rejected if the statistic in Equation 17.8 is greater than the quantile with confidence level $α$ of a $χ^{2} (m)$ random variable. Such critic value, denoted with $χ_{α}^{2} (m)$ represents the value for which the probability that a $χ^{2} (m)$ is greater than the value $χ_{α}^{2} (m)$ is exactly $α$ , i.e. $P (χ^{2} (m) > x_{α}) = α .$ In this case the probability to have an error of type I, i.e. rejecting $H_{0}$ when $H_{0}$ is true is exactly $α$ .

Instead, by applying property 4. in Section 35.1.2, under $H_{1}$ the statistic $T_{m}$ is distributed as a non central $χ^{2} (m, δ)$ , i.e. $\begin{matrix} (17.9) & T_{m} \overset{H_{1}}{\sim} χ^{2} (m, δ), \end{matrix}$ where the non centrality parameter $δ$ is computed as: $δ = (R^{⊤} b - r)^{⊤} (σ_{e}^{2} R^{⊤} (X^{⊤} X)^{- 1} R)^{- 1} (R^{⊤} b - r) .$