14  Introduction

Statistical modeling applies statistical methods to real-world data to give empirical content to relationships. It aims to quantify phenomena and develop models and test hypotheses, making it a crucial field for economic research, policy analysis, and decision-making. The aim of the statistical modeling is to study the (unknown) mechanism that generates the data, i.e., the Data Generating Process (DGP). The statistical model is a function that approximates the DGP.

14.1 The matrix of data

Let’s consider n realizations defining a sample for i=1,2,,n. Suppose we have p dependent variables and k explanatory variables (also known as predictors). The data matrix for X, the exogenous (regressors), is then composed as: Xn×k=(x1,1x1,2x1,jx1,kx2,1x2,2x2,jx2,kxn,1xn,2xn,jxn,k)=(x1x2xk), where

  • The i-th row contains the variables related to the i-th statistical unit (e.g., an individual, a firm, or a country).
  • The j-th column contains all the observations related to the j-th variable.

The matrix Y represent the endogenous (dependent), i.e.  Yn×p=(y1,1y1,2y1,jy1,py2,1y2,2y2,jy2,pyn,1yn,2yn,jyn,p)=(y1y2yp). Hence, the complete matrix of data W reads (14.1)Wn×(k+p)=(YX)=(y1,1y1,px1,1x1,kyn,1yn,pxn,1xn,k). In general, when p=1 then the model has only one equation to satisfy for i=1,,n, for example (14.2)yi=b0+b1xi,1+b2xi,2++bkxi,k+ei. Otherwise, when p>1 there are more than one dependent variable and the model is composed by p-equations for i=1,,n, i.e. the same linear model with p equations reads: (14.3){yi,1=b0,1+b1,1xi,1+b1,2xi,2++b1,kxi,k+ei,1yi,2=b0,2+b2,1xi,1+b2,2xi,2++b2,kxi,k+ei,2yi,p=b0,p+bp,1xi,1+bp,2xi,2++bp,kxi,k+ei,p

14.2 Joint, conditional and marginals

Let’s consider the bi-dimensional random vector W in and let’s write the joint distribution of X and Y, i.e.  (14.4)P(Yy,Xx)joint probability=FY,X(y,x)distribution function. In the continuous case, there exists a joint density fY,X(y,x) such that: (14.5)FY,X(y,x)=xyfY,X(y,x)dydx. Moreover, from the joint distribution () it is possible to recover the marginals distributions, i.e.
(14.6)fY(y)=yFY,X(y,x)=fY,X(y,x)dxfX(x)=xFY,X(y,x)=fY,X(y,x)dy

Then, given the marginals (), it is possible to compute the unconditional moments, i.e.

  1. First moment: E{Y}=yfY(y)dy.
  2. Second moment: E{Y2}=y2fY(y)dy.
  3. Variance: V{Y}=E{Y2}E{Y}2.

Using the Bayes theorem (), from the joint distribution () it is possible to recover the conditional distribution, i.e
(14.7)fYX(yx)=fY,X(y,x)fX(x). Given the conditional distributions, it is possible to compute the conditional moments, i.e. 

  1. First moment: E{YX}=yfYX(yx)dy.
  2. Second moment: E{Y2X}=y2fYX(yx)dy.

Hence, from the joint density can be represented as the product of the conditional and the marginal, i.e.  (14.8)fY,X(y,x)joint=fYX(yx)conditionalfX(x)marginal.

Example 14.1 Let’s consider a Gaussian setup, i.e.  W=(YX)N((μYμX),(ΣYYΣYXμXYΣXX)) For a Gaussian setup if (XY) are jointly normal, then the marginals are normal, i.e. YN(μY,ΣYY),XN(μX,ΣXX). and also the conditionals distributions are normal, i.e.  YXN(μYX,ΣYYX),XYN(μXY,ΣXXY). and the conditional moments reads explicitly as: E{YX}=μYX==μY+ΣYXΣXX1(XμX)==μYΣYXΣXX1μX+ΣYXΣXX1X==μYbYXμX+bYXX==aYX+bYXX and V{YX}=ΣYYΣYXΣXX1ΣYX. In this setup the parameters are:

  • Joint distribution, θ={μY,μX,ΣXX,ΣXY,ΣYY}.
  • Conditional distribution, λ1={aYX,bYX,ΣYYX}.
  • Marginal distribution, λ2={μX,ΣXX}.

Noting that λ1 is a function of θ, i.e. τ=f(λ1) in the Gaussian case it is possible to prove that λ1 and λ2 are free to vary. Hence, imposing restrictions on λ1 do not impose restrictions on λ2. In general, if the parameters of interest are a function of the conditional distribution and λ1 and λ2 are free to vary, then the inference can be done without losing of information considering the conditional model. In this case we say that X is weakly exogenous for τ=f(λ1).

14.3 Conditional expectation model

Let’s consider a very general conditional expectation model with p=1, of which the linear models are a special case. In matrix notation it can be written as: (14.9)y=E{yX}+e, where the conditional expectation errors are defined as: (14.10)e=yE{yX}. Then, in general the unconditional expectation of the residuals e and the covariance between the residuals and the regressors are zero, i.e. E{e}=0,E{eX}=0. Moreover, the conditional expectation error is orthogonal to any transformation of the conditioning variables. Consider a more general setup, i.e.  (14.11)y=E{yX}+e,E{yX}=g(X), we have that
E{eg(X)}=0.

Proof. Let’s start the unconditional expectation of the residuals defined in , i.e.  E{e}=E{yE{yX}}==E{y}E{E{yX}}==E{y}E{y}=0 Then, let’s compute the expected value of between the residuals and the regressors, i.e.  E{e}=0Cv{e,X}=E{eX} For simplicity let’s assume that X can takes only values in {0,1}. Applying the tower property of conditional expectation one obtain: E{eX}=E{E{eXX}}==E{eXX=0}P(X=0)+E{eXX=1}P(X=1)==E{eXX=1}P(X=1) Then, let’s substitute e from and X with 1, i.e. E{eX}=E{(yE{yX})XX=1}P(X=1)==E{yX=1}P(X=1)E{E{yX}X=1}P(X=1)==E{yX=1}P(X=1)E{yX=1}P(X=1)=0 For a general transformation of the regressors as in , the covariance is computed as: E{eg(X)}=E{E{eg(X)X}}==E{g(X)E{eX}}==E{g(X)E{yE{yX}X}}==E{g(X)[E{yX}E{yX}]}=0

14.4 Uniequational linear models

Let’s consider an uni-equational linear model, i.e. with p=1 in (), is expressed in compact matrix notation as: (14.12)y=Xb+e, where b and e represent the true parameters and residuals in population. Let’s consider a sample of n-observations extracted from a population, then the matrix of the regressors X reads Xn×k=(x1,1x1,kxn,1xn,k)=(x1xk), while the vectors of dependent variable and of the residuals reads yn×1=(y1yp),en×1=(e1en). Hence, the matrix of data W is composed by: (14.13)Wn×(k+1)=(yX)=(y1x1,1x1,kynxn,1xn,k).

14.4.1 Estimators of b

Let’s denote with Θb the parameter space, i.e. ΘbRk, and with Q an estimator function of the unknown true parameter bΘb. Then, the function Q defines an estimator of b, meaning it is a function that takes the matrix of data as input and returns a vector of parameters within Θb as output: Q:WΘb,such thatQ(W)=b^Θb, where b^ is an estimates of the true population’s parameter b. Then, the fitted values y^ are a function of the estimate and are defined as: (14.14)y^=Xb^. Consequently, the fitted residuals, which measure the discrepancies between the observed and the fitted values, are also a function of b^, i.e. (14.15)e(b^)=yy^=yXb^.

Different optimal estimators of b

As we will see in , the assumptions on the variance of the residuals determines the optimal estimator of b. In general, the residuals could be

  1. omoskedastic: the residuals are uncorrelated and their variance is equal for each observation.
  2. heteroskedastic: the residuals are uncorrelated and their variance is difference for each observation.
  3. autocorrelated: the residuals are correlated and their variance is equal and their variance is difference for each observation.

As shown in , depending on the assumption (1,2 or 3) the optimal estimator of b is obtained with Ordinary Last Square for case 1, while Generalized Last Square for case 2 and 3.

Figure 14.1: Different classes of estimator for linear models.

For example, if the residuals are autocorrelated, then their conditional variance-covariance matrix in matrix notation reads Σn×n=V{eeX}=E{eeX}. More explicitly,
(14.16)Σn×n=(e12e1e2e1ene2e1e22e2enene1ene2en2)=(σ12σ1,2σ1,nσ2,1σ22σ2,nσn,1σn,2σn2). Since the matrix Σ is symmetric the number of unique values (free elements) are given by n variances and n(n1)2 covariances, i.e. n+n(n1)2=2n+n2n2=n+n22=n(n+1)2.