17  Restricted least squares

References: Chapter 4. Gardini A. ().

17.1 A general framework for linear restrictions

Let’s consider a generic uni-variate linear model with k-regressors, namely y=b1X1++bjXj++bkXk+e=bX+e, and suppose that we are interest in testing if the bj coefficient is statistically different from a certain known value r. In this case the null hypothesis, that is H0:bj=r, can be equivalently represented using a more flexible matrix notation, i.e.  H0:bj=rH0:Rbr=0, where Rk×1=(010)j-th position. Hence, the linear restriction in matrix form reads explicitly as H0:Rk×1bk×1r1×1=01×1(010)j-th position(b1bjbk)(r)=(0).

17.2 Multiple restrictions

Let’s consider a linear model of the form y=b1X1+b2X2+b3X3+b4X4, and suppose that the aim is to test at the same time the following null hypothesis, i.e.  H0:(1)b1b2=0b1 and b2 has same effect(2)b3+b4=1b3 plus b4 unitary root Let’s construct the vector for (1) (first column of R) and (2) (first column of R), i.e. R2×4b4×1r2×1=02×1(11000011)(b1b2b3b4)(01)=(00).

17.3 Restricted least squares

Proposition 17.1 Let’s consider a linear model under the OLS assumptions and let’s consider a set of m linear hypothesis on the parameters of the model taking the form H0:Rm×kbk×1rm×1=0m×1. Therefore, the optimization problem became restricted to the space of parameters that satisfies the conditions. More precisely, such parameters will be in a subset of the parameter space, i.e. Θ~bΘb, where the linear constraint holds true. Formally, the space Θ~b is defined as Θ~b={bRk:Rbr=0}. Hence, the optimization problem in is restricted to only the parameters that satisfy the constraint. Formally, the RLS estimator is the solution of the following minimization problem, i.e.  (17.1)bRLS=argminbΘ~b{QOLS(b)}. where QOLS reads as in the OLS case (). Notably, the analytic solution for bRLS reads (17.2)bRLS=bOLS(XX)1R[RT(XX)1R]1(RbOLSr).

Proof. In order to solve the minimization problem in , let’s construct the Lagrangian L(x,λ), i.e. L(x,λ)=f(x)λg(x), where λ is the vector of the Lagrange multipliers. Minimizing L(x,λ) is equivalent to find the value of x that minimize f(x) under the constraint g(x)=0. In fact, it is possible to prove that the minimum is found as: (17.3)argminxχL(x,λ){(A)xL(x,λ)=0(B)λL(x,λ)=0 In the case of RLS estimate the Lagrangian reads: L(b,λ)=QOLS(b)2λ(Rbr). Then, from one obtain the following system of equation, i.e.  {(A)bL(b,λ)=2Xy+2XXb2Rλ=0(B)λL(b,λ)=2(Rbr)=0 Let’s explicit b=bRLS from (A), i.e.  (17.4)bRLS=(XX)1Xy(XX)1Rλ==bOLS(XX)1Rλ Let’s now substitute in (B), i.e.  RbRLSr=0R[bOLS(XX)1Rλ]r=0RbOLSR(XX)1Rλr=0RbOLSr=[R(XX)1R]λ Hence, it is possible to explicit the Lagrange multipliers λ as: (17.5)λ=[R(XX)1R]1(RbOLSr). Finally, substituting in gives the optimal solution, i.e.  bRLS=bOLS(XX)1Rλ==bOLS(XX)1R[R(XX)1R]1(RbOLSr) Note that if constraints hold true in the OLS estimate, H0 is true and therefore RbOLSr=0. Hence the RLS and OLS parameters are the same, i.e. bRLS=bOLS.

17.4 Properties RLS

  1. The RLS estimator is correct if and only if the restriction imposed by H0 is true in population. In fact, it’s expected value is computed as: (17.6)E{bRLSX}=b(XX)1R[R(XX)1R]1(Rbr), and it is correct if and only if the second component is zero, i.e. if H0 holds true.

Proof. Let’s apply the expected value on remembering that X, R and r are non-stochastic and that bOLS is correct (). Developing the computations gives: E{bRLSX}=E{bOLSX}E{(XX)1R[R(XX)1R]1(RbOLSr)X}==b(XX)1R[R(XX)1R]1(RE{bOLSX}r)==b(XX)1R[R(XX)1R]1(Rbr) Hence bRLS is correct if and only if the restriction holds in population. E{bRLSX}=bRbr=0.

17.5 A test for linear restrictions

Under the assumption of normality of the error terms, it is possible to derive a statistic to test the significance of the linear restrictions imposed by Rbr=0. Let’s test the validity of the hull hypothesis H0 against the alternative hypothesis H1, i.e.  H0:RbOLSr=0,H1:RbOLSr0. Under normality, the OLS estimate are multivariate normal, thus applying the scaling property one obtain that the distribution under H0 is normal, i.e.  RbOLSrN(Rbr,σe2R(XX)1R). Applying the relation between the distribution of the quadratic form of a multivariate normal and the χ2 distribution from property 3 in , one obtain the statistic: (17.7)Tm=(Rbr)(σe2R(XX)1R)1(Rbr). Under H0, Tm is distributed as a χ2(m), where m is the number of linear restrictions, i.e. (17.8)TmH0χ2(m).

As general decision rule H0 is rejected if the statistic in is greater than the quantile with confidence level α of a χ2(m) random variable. Such critic value, denoted with χα2(m) represents the value for which the probability that a χ2(m) is greater than the value χα2(m) is exactly α, i.e.  P(χ2(m)>xα)=α. In this case the probability to have an error of type I, i.e. rejecting H0 when H0 is true is exactly α.

Instead, by applying property 4. in , under H1 the statistic Tm is distributed as a non central χ2(m,δ), i.e. (17.9)TmH1χ2(m,δ), where the non centrality parameter δ is computed as: δ=(Rbr)(σe2R(XX)1R)1(Rbr).