Statistical modeling applies statistical methods to real-world data to give empirical content to relationships. It aims to quantify phenomena and develop models and test hypotheses, making it a crucial field for economic research, policy analysis, and decision-making. The aim of the statistical modeling is to study the (unknown) mechanism that generates the data, i.e., the Data Generating Process (DGP). The statistical model is a function that approximates the DGP.
The matrix of data
Let’s consider realizations defining a sample for . Suppose we have dependent variables and explanatory variables (also known as predictors). The data matrix for , the exogenous (regressors), is then composed as: where
- The i-th row contains the variables related to the -th statistical unit (e.g., an individual, a firm, or a country).
- The j-th column contains all the observations related to the -th variable.
The matrix represent the endogenous (dependent), i.e. Hence, the complete matrix of data reads In general, when then the model has only one equation to satisfy for , for example Otherwise, when there are more than one dependent variable and the model is composed by -equations for , i.e. the same linear model with equations reads:
Joint, conditional and marginals
Let’s consider the bi-dimensional random vector in Equation 14.1 and let’s write the joint distribution of and , i.e. In the continuous case, there exists a joint density such that: Moreover, from the joint distribution (Equation 14.4) it is possible to recover the marginals distributions, i.e.
Then, given the marginals (Equation 14.6), it is possible to compute the unconditional moments, i.e.
- First moment: .
- Second moment: .
- Variance: .
Using the Bayes theorem (Theorem 6.2), from the joint distribution (Equation 14.4) it is possible to recover the conditional distribution, i.e
Given the conditional distributions, it is possible to compute the conditional moments, i.e.
- First moment: .
- Second moment: .
Hence, from Equation 14.7 the joint density can be represented as the product of the conditional and the marginal, i.e.
Example 14.1 Let’s consider a Gaussian setup, i.e. For a Gaussian setup if are jointly normal, then the marginals are normal, i.e. and also the conditionals distributions are normal, i.e. and the conditional moments reads explicitly as: and In this setup the parameters are:
- Joint distribution, .
- Conditional distribution, .
- Marginal distribution, .
Noting that is a function of , i.e. in the Gaussian case it is possible to prove that and are free to vary. Hence, imposing restrictions on do not impose restrictions on . In general, if the parameters of interest are a function of the conditional distribution and and are free to vary, then the inference can be done without losing of information considering the conditional model. In this case we say that is weakly exogenous for .
Conditional expectation model
Let’s consider a very general conditional expectation model with , of which the linear models are a special case. In matrix notation it can be written as: where the conditional expectation errors are defined as: Then, in general the unconditional expectation of the residuals and the covariance between the residuals and the regressors are zero, i.e. Moreover, the conditional expectation error is orthogonal to any transformation of the conditioning variables. Consider a more general setup, i.e. we have that
Proof. Let’s start the unconditional expectation of the residuals defined in Equation 14.9, i.e. Then, let’s compute the expected value of between the residuals and the regressors, i.e. For simplicity let’s assume that can takes only values in . Applying the tower property of conditional expectation one obtain: Then, let’s substitute from Equation 14.9 and with , i.e. For a general transformation of the regressors as in Equation 14.11, the covariance is computed as:
Uniequational linear models
Let’s consider an uni-equational linear model, i.e. with in (Equation 14.2), is expressed in compact matrix notation as: where and represent the true parameters and residuals in population. Let’s consider a sample of -observations extracted from a population, then the matrix of the regressors reads while the vectors of dependent variable and of the residuals reads Hence, the matrix of data is composed by:
Estimators of b
Let’s denote with the parameter space, i.e. , and with an estimator function of the unknown true parameter . Then, the function defines an estimator of , meaning it is a function that takes the matrix of data as input and returns a vector of parameters within as output: where is an estimates of the true population’s parameter . Then, the fitted values are a function of the estimate and are defined as: Consequently, the fitted residuals, which measure the discrepancies between the observed and the fitted values, are also a function of , i.e.
As we will see in Chapter 15, the assumptions on the variance of the residuals determines the optimal estimator of . In general, the residuals could be
- omoskedastic: the residuals are uncorrelated and their variance is equal for each observation.
- heteroskedastic: the residuals are uncorrelated and their variance is difference for each observation.
- autocorrelated: the residuals are correlated and their variance is equal and their variance is difference for each observation.
As shown in Figure 14.1, depending on the assumption (1,2 or 3) the optimal estimator of is obtained with Ordinary Last Square for case 1, while Generalized Last Square for case 2 and 3.
For example, if the residuals are autocorrelated, then their conditional variance-covariance matrix in matrix notation reads More explicitly,
Since the matrix is symmetric the number of unique values (free elements) are given by variances and covariances, i.e.