18 Multiequationals linear models

Code

library(tidyverse)
library(mvtnorm)
library(backports)
library(latex2exp)

Let’s consider a multivariate linear model, i.e. with $p > 1$ in (Equation 14.3), then the model in matrix notation reads: $\underset{n \times p}{Y} = \underset{n \times 1}{J_{n, 1}} \underset{1 \times p}{a^{⊤}} + \underset{n \times k}{X} \underset{k \times p}{b^{⊤}} + \underset{n \times p}{e}$

18.1 OLS estimate

As in the uni-variate case the optimal parameters are computed as: $\begin{aligned} b^{OLS} = C v (Y, X) C v (X)^{- 1} \\ α^{OLS} = E {Y} - b^{OLS} E {X} \end{aligned}$ And the variance covariance matrix of the residuals is computed as: $Σ = C v (e) = C v (Y) - b^{OLS} C v (Y, X)$

18.1.1 Example

Let’s consider $n$ -simulated observations of the explicative variables $X$ drown from a multivariate normal, i.e. $X \sim N (E {X}, C v {X})$ , with parameters $E {X} = (\begin{matrix} 0.5 \\ 0.5 \\ 0.5 \end{matrix}) C v {X} = (\begin{matrix} 0.5 & 0.2 & 0.1 \\ 0.2 & 1.2 & 0.1 \\ 0.1 & 0.1 & 0.3 \end{matrix})$ Let’s consider two dependent variables, hence $p = 2$ and $k = 3$ . Let’s now simulate the $p \times k = 6$ slopes parameter drown from a standard normal, i.e. for $j = 1, \dots, 6$ , $b_{j} \sim N (0, 1)$ . The intercept parameters $a$ are simulated drown from a uniform distribution in [0,1]. In the multivariate case $a$ and $b$ became matrices, i.e. $\underset{p \times k}{b} = (\begin{matrix} b_{1, 1} & b_{1, 2} & b_{1, 3} \\ b_{2, 1} & b_{2, 2} & b_{2, 3} \end{matrix}) \underset{p \times 1}{α} = (\begin{matrix} α_{1} \\ α_{2} \end{matrix})$

For $i = 1, \dots, n$ , let’s consider a model of the form: ${\begin{cases} Y_{i, 1} = β_{0, 1} + β_{1, 1} X_{i, 1} + β_{1, 2} X_{i, 2} + β_{1, k} X_{i, 3} + u_{i, 1} \\ Y_{i, 2} = β_{0, 2} + β_{2, 1} X_{i, 1} + β_{2, 2} X_{i, 2} + β_{2, k} X_{i, 3} + u_{i, 2} \end{cases}$ where $u_{i, 1}$ and $u_{i, 2}$ are simulated from a multivariate normal random variables with true covariance matrix equal to: $C v {u} = (\begin{matrix} 0.55 & 0.3 \\ 0.3 & 0.70 \end{matrix})$ Hence, the procedure is structured as:

Simulate of the explanatory variables, the regression parameters and the residuals.
Simulate the perturbed $\tilde{Y}$ (regression with errors).
Fit the regression parameters on the $\tilde{Y}$ .
Compute the fitted residuals from the prediction obtained with the parameters in step 3. and compute their variance covariance matrix.
Compare the results with the true parameters.

Setup

######################## Setup ########################
set.seed(1) # random seed 
n <- 500    # number of observations
p <- 2      # number of dependent variables 
k <- 3      # number of regressors 
# True regressor's mean 
true_e_x <- matrix(rep(0.5, k), ncol = 1)
# True regressor's covariance matrix 
true_cv_x <-  matrix(c(v_z1 = 0.5, cv_12 = 0.2, cv_13 = 0.1, 
                       cv_21 = 0.2, v_z2 = 1.2, cv_23 = 0.1, 
                       cv_31 = 0.1, cv_32 = 0.1, v_z3 = 0.3), 
                     nrow = k, byrow = FALSE)
# True covariance of the residuals 
true_cv_e <- matrix(c(0.55, 0.3, 0.3, 0.70), nrow = p)
##########################################################
# Generate a synthetic data set 
## Regressors  
X <- rmvnorm(n, true_e_x, true_cv_x) 
## Slope (Beta)
true_beta <- rnorm(p*k)
true_beta <- matrix(true_beta, ncol = k, byrow = TRUE) 
## Intercept (Alpha)
true_alpha <- runif(p, min = 0, max = 1)
true_alpha <- matrix(true_alpha, ncol = 1) 
## Matrix of 1 for matrix multiplication  
ones <- matrix(rep(1, n), ncol = 1)
## Fitted response variable 
Y <- ones %*% t(true_alpha) + X %*% t(true_beta)
## Simulated error 
eps <- rmvnorm(n, sigma = true_cv_e)
## Perturbed response variable 
Y_tilde <- Y + eps

Parameters fit

# True Beta 
df_beta_true <- as_tibble(true_beta)
colnames(df_beta_true) <- paste0("$\\beta_", 1:ncol(df_beta_true), "$")
# Perturbed Beta (fitted)
fit_beta <- cov(Y_tilde, X) %*% solve(cov(X))
df_beta_pert <- as_tibble(fit_beta)
colnames(df_beta_pert) <- paste0("$\\beta_", 1:ncol(df_beta_pert), "$")
# True Alpha 
df_alpha_true <- as_tibble(t(true_alpha))
colnames(df_alpha_true) <- paste0("$\\alpha_", 1:ncol(df_alpha_true), "$")
# Perturbed Alpha (fitted)
## Perturbed mean  
e_y <- matrix(apply(Y_tilde, 2, mean), ncol = 1)
e_x <- matrix(apply(X, 2, mean), ncol = 1)
## Estimated Alpha (on perturbed data)
fit_alpha <- e_y - cov(Y_tilde, X) %*% solve(cov(X)) %*% e_x
df_alpha_pert <- as_tibble(t(fit_alpha))
colnames(df_alpha_pert) <- paste0("$\\alpha_", 1:ncol(df_alpha_pert), "$")

Code table output

dplyr::bind_rows(
  dplyr::bind_cols(Type = "True", df_beta_true),
  dplyr::bind_cols(Type = "Fitted", df_beta_pert)
  ) %>%
  dplyr::mutate_if(is.numeric, format, digits = 4, scientific = FALSE) %>%
  knitr::kable(escape = FALSE)

dplyr::bind_rows(
  dplyr::bind_cols(Type = "True", df_alpha_true),
  dplyr::bind_cols(Type = "Fitted", df_alpha_pert)
  ) %>%
  dplyr::mutate_if(is.numeric, format, digits = 4, scientific = FALSE) %>%
  knitr::kable(escape = FALSE)

Table 18.1: Fitted parameters

Type	$β_{1}$	$β_{2}$	$β_{3}$
True	0.8500	-0.9253	0.8936
True	-0.9410	0.5390	-0.1820
Fitted	0.8457	-0.8699	0.9396
Fitted	-0.9532	0.5518	-0.1804

Type	$α_{1}$	$α_{2}$
True	0.8137	0.8068
Fitted	0.7942	0.7423