14  Introduction

Statistical modeling applies statistical methods to real-world data to give empirical content to relationships. It aims to quantify phenomena and develop models and test hypotheses, making it a crucial field for economic research, policy analysis, and decision-making. The aim of the statistical modeling is to study the (unknown) mechanism that generates the data, i.e., the Data Generating Process (DGP). The statistical model is a function that approximates the DGP.

14.1 The matrix of data

Let’s consider n realizations defining a sample for i=1,2,,n. Suppose we have p dependent variables and k explanatory variables (also known as predictors). The data matrix of the exogenous (regressors) Xn,k is defined as in , while the matrix composed by the endogenous (dependent) variables Y reads Yn×p=(y1,1y1,2y1,jy1,py2,1y2,2y2,jy2,pyn,1yn,2yn,jyn,p). Hence, the complete matrix of data reads (14.1)Wn×(k+p)=(YX)=(y1,1y1,px1,1x1,kyn,1yn,pxn,1xn,k). In general, when p=1 then the model has only one equation to satisfy for i=1,,n, for example (14.2)Yi=b0+b1xi,1+b2xi,2++bkxi,k+ui. Otherwise, when p>1 there are more than one dependent variable and the model is composed by p-equations for i=1,,n, i.e. the same linear model with p equations reads: (14.3){Yi,1=b0,1+b1,1xi,1+b1,2xi,2++b1,kxi,k+ui,1Yi,2=b0,2+b2,1xi,1+b2,2xi,2++b2,kxi,k+ui,2Yi,p=b0,p+bp,1xi,1+bp,2xi,2++bp,kxi,k+ui,p Thus the matrix of the residuals components reads Un×p=(u1,1u1,2u1,iu1,pu2,1u2,2u2,iu2,pun,1un,2un,iun,p),

14.2 Joint, conditional and marginals

Let’s consider the bi-dimensional random vector in and let’s write the joint distribution of X and Y, i.e.  (14.4)P(Y1,,Ypy1,,yp,X1,,Xpx1,,xkjoint probability)=FY,X(y1,,yp,x1,,xkdistribution function). In the continuous case, there exists a joint density fY,X such that: (14.5)FY,X(y,x)=xyfY,X(y,x)dydx. Moreover, from the joint distribution () it is possible to recover the marginals distributions, i.e.
(14.6)fY(y)=yFY,X(y,x)=fY,X(y,x)dx,fX(x)=xFY,X(y,x)=fY,X(y,x)dy.

Then, given the marginals (), it is possible to compute the unconditional moments, for example

  • First moment: E{Y}=yfY(y)dy.
  • Second moment: E{Y2}=y2fY(y)dy.

Applying the Bayes theorem (), from the joint distribution () it is possible to recover the conditional distribution, i.e
(14.7)fYX(yx)=fY,X(y,x)fX(x)fY,X(y,x)joint=fYX(yx)conditionalfX(x)marginal. Given the conditional distribution the conditional moments reads E{YX}=yfYX(yx)dy,E{Y2X}=y2fYX(yx)dy.

Example 14.1 Let’s consider a multivariate Gaussian setup, i.e.  (YX)MVN((μYμX),(ΣYYΣYXμXYΣXX)). If (XY) are jointly normal, then the marginals are multivariate normal, i.e. YMVN(μY,ΣYY),XMVN(μX,ΣXX), and also the conditionals distributions, i.e.  YXMVN(μYX,ΣYYX),XYMVN(μXY,ΣXXY). In such model’s setup the conditional expectation of Y given X reads E{YX}=μYX==μY+ΣYXΣXX1(XμX)==μYΣYXΣXX1μX+ΣYXΣXX1X==μYbYXμX+bYXX==aYX+bYXX and the conditional variance as V{YX}=ΣYYΣYXΣXX1ΣYX. In this setup the parameters are:

  • Joint distribution, θ={μY,μX,ΣXX,ΣXY,ΣYY}.
  • Conditional distribution, λ1={aYX,bYX,ΣYYX}.
  • Marginal distribution, λ2={μX,ΣXX}.

Noting that λ1 is a function of θ, i.e. τ=f(λ1) in the Gaussian case it is possible to prove that λ1 and λ2 are free to vary. Hence, imposing restrictions on λ1 do not impose restrictions on λ2. In general, if the parameters of interest are a function of the conditional distribution and λ1 and λ2 are free to vary, then the inference can be done without losing of information considering the conditional model. In this case we say that X is weakly exogenous for τ=f(λ1).

14.3 Conditional expectation model

Let’s consider a very general conditional expectation model with p=1, of which the linear models are a special case. In matrix notation it can be written as: (14.8)y=E{yX}+u, where the conditional expectation errors are defined as: (14.9)u=yE{yX}.

Proposition 14.1 In a conditional expectation model as in , the residuals u, defined as in , have unconditional expectation and covariance with the regressors X equal to zero, i.e. E{u}=0,E{uX}=0. Moreover, the conditional expectation error is orthogonal to any transformation of the conditioning variables, i.e.  (14.10)y=E{yg(X)}+uE{ug(X)}=0.

Proof. Let’s start the unconditional expectation of the residuals defined in , i.e.  E{u}=E{yE{yX}}==E{y}E{E{yX}}==E{y}E{y}=0 Then, let’s compute the expected value of between the residuals and the regressors, i.e.  E{u}=0Cv{u,X}=E{uX}. For simplicity let’s assume that X can takes only values in {0,1}. Applying the tower property of conditional expectation one obtain: E{uX}=E{E{uXX}}==E{uXX=0}P(X=0)+E{uXX=1}P(X=1)==E{uXX=1}P(X=1) Then, let’s substitute u from and X with 1, i.e. E{uX}=E{(yE{yX})XX=1}P(X=1)==E{yX=1}P(X=1)E{E{yX}X=1}P(X=1)==E{yX=1}P(X=1)E{yX=1}P(X=1)=0 For a general transformation of the regressors as in , the covariance is computed as: E{ug(X)}=E{E{ug(X)X}}==E{g(X)E{uX}}==E{g(X)E{yE{yX}X}}==E{g(X)[E{yX}E{yX}]}=0

14.4 Uniequational linear models

Let’s consider an uni-equational linear model, i.e. with p=1 in (), is expressed in compact matrix notation as: (14.11)yn×1=Xn×kbk×1+un×1, where b and u represent the true parameters and residuals in population. Let’s consider a sample of n-observations extracted from a population, then the matrix of the regressors X reads as in , while the vectors of dependent variable and of the residuals reads yn×1=(y1yn),un×1=(u1un). Hence, the matrix of data W is composed by: (14.12)(yX)=(y1x1,1x1,kynxn,1xn,k).

14.4.1 Estimators of b

In general, the true population parameter b is unknown and lives in the parameter space, i.e. bΘbRk. In the following section we will define with Q the estimator function while b^ will denote one of its possible estimates.

Depending on the specification of the model, the function Q takes as input the matrix of data and returns as output a vector of estimates, i.e.  Q:WΘb,such thatQ(W)=b^Θb. In this context, the fitted values y^ can be seen as function of the estimate b^, i.e.  (14.13)y^(b^)=Xb^. Consequently, the fitted residuals u^, which measure the discrepancies between the observed and the fitted values, are also a function of b, i.e. (14.14)u^(b^)=yy^(b^)=yXb^.

Different optimal estimators of b

As we will see in , the assumptions on the variance of the residuals determines the optimal estimator of b. In general, the residuals could be

  1. omoskedastic: the residuals are uncorrelated and their variance is equal for each observation.
  2. heteroskedastic: the residuals are uncorrelated and their variance is difference for each observation.
  3. autocorrelated: the residuals are correlated and their variance is equal and their variance is difference for each observation.

As shown in , depending on the assumption (1,2 or 3) the optimal estimator of b is obtained with Ordinary Last Square for case 1, while Generalized Last Square for case 2 and 3.

Figure 14.1: Different classes of estimator for linear models.

For example, if the residuals are correlated, then their conditional variance-covariance matrix reads Σ=V{uuX}, or more explicitly,
(14.15)Σn×n=(u12u1u2u1unu2e1u22u2unune1unu2un2)=(σ12σ1,2σ1,nσ2,1σ22σ2,nσn,1σn,2σn2). Since the matrix Σ is symmetric the number of distinct elements above (or below) the diagonal reads (n2)=n(n1)2. Hence, given that the number of elements of Σ is n×n, the number of unique values (free elements) are given by the n variances and plus n(n1)2 covariances, i.e. free elements=n+n(n1)2=n(n+1)2>n.