13 Multivariate data

Last modified

May 31, 2026

Let’s consider a matrix \(\mathbf{X}_{n \times k}\) with \(n\) observations and \(k\) variables. Then, let’s define some useful operations that can be performed on the matrix. \[ \underset{n \times k }{\mathbf{X}} = \begin{pmatrix} x_{1,1} & \dots & x_{1,j} & \dots & x_{1,k} \\ \vdots & & \vdots & & \vdots \\ x_{i,1} & \dots & x_{i,j} & \dots & x_{i,k} \\ \vdots & & \vdots & & \vdots \\ x_{n,1} & \dots & x_{n,j} & \dots & x_{n,k} \\ \end{pmatrix} = \begin{pmatrix} \underset{n \times 1}{\mathbf{x}_{1}} & \dots & \underset{n \times 1}{\mathbf{x}_{j}} & \dots & \underset{n \times 1}{\mathbf{x}_{k}} \\ \end{pmatrix} \text{.} \tag{13.1}\] where \(\mathbf{x}_{j} = \begin{pmatrix} x_{1,j} & \dots & x_{i,j} & \dots & x_{n,j} \end{pmatrix}^\top\). In general:

The i-th row contains the variables related to the \(i\)-th statistical unit (e.g., an individual, a firm, or a country).
The j-th column contains all the observations related to the \(j\)-th variable.

Generate the matrix of data

# True variance - covariance matrix 
sigma <- diag(1, 4)
sigma[2,2] <- 1.3
sigma[3,3] <- 0.7
sigma[4,4] <- 1.7
sigma[1,2] <- sigma[2,1] <- 0.3
sigma[1,3] <- sigma[3,1] <- 0.4
sigma[1,4] <- sigma[4,1] <- -0.8
# Simulate data 
set.seed(1)
W <- mvtnorm::rmvnorm(10, sigma = sigma) 
y <- W[,1]
X <- W[,-1]
n <- nrow(X)
k <- ncol(X)

\[ \underset{10 \times 3 }{\mathbf{X}} = \begin{array}{c|rrrr} i & \mathbf{x}_1 & \mathbf{x}_2 & \mathbf{x}_3 \\ \hline 1 & 0.167 & -0.750 & 2.191 \\ 2 & -0.865 & 0.524 & 0.793 \\ 3 & -0.276 & 1.374 & 0.327 \\ 4 & -2.623 & 0.783 & 0.179 \\ 5 & 1.061 & 0.657 & 0.809 \\ 6 & 0.9743 & 0.179 & -2.806 \\ 7 & -0.000647 & -0.0367 & -2.078 \\ 8 & 0.364 & 0.946 & 0.132 \\ 9 & 0.0189 & -1.0173 & -0.733 \\ 10 & -0.132 & 0.8137 & 1.153 \\ \end{array} \]

13.1 Vector of means

Let’s consider the matrix \(\mathbf{X}\) (Equation 13.1), then the vector of means for each column is computed as: \[ \underset{k \times 1}{\bar{\mathbf{x}}} = \begin{pmatrix} \bar{x}_1 \\ \vdots \\ \bar{x}_k \\ \end{pmatrix} = \begin{pmatrix} \frac{1}{n} \mathbf{J}_{1,n} \mathbf{X} \end{pmatrix}^{\top} = \frac{1}{n} \mathbf{X}^{\top} \mathbf{J}_{n,1} \text{.} \] where \(\mathbf{J}_{n,1}\) is the matrix of ones (Equation 31.2).

Vector of means

# Vector of ones 
J_1n <- matrix(1, nrow = 1, ncol = n)
# Vector of means 
x_bar <- t((J_1n %*% X)/n)

# Standard computation
# matrix(apply(X, 2, mean), nrow = k, ncol = 1)

\[ \underset{3 \times 1}{\bar{\mathbf{x}}} = \begin{pmatrix} \bar{x}_1 \\ \bar{x}_2 \\ \bar{x}_3 \\ \end{pmatrix} = \begin{pmatrix} - 0.131 \\ 0.347 \\ -0.00296 \\ \end{pmatrix} \]

13.2 Deviation matrix

Let’s compute the matrix of centered observations, where each element is computed as \(\tilde{x}_{i,j} = x_{i,j} - \bar{x}_{j}\), i.e. \[ \underset{n \times k }{\widetilde{\mathbf{X}}} = \begin{pmatrix} \tilde{x}_{1,1} & \dots & \tilde{x}_{1,j} & \dots & \tilde{x}_{1,k} \\ \vdots & & \vdots & & \vdots \\ \tilde{x}_{i,1} & \dots & \tilde{x}_{i,j} & \dots & \tilde{x}_{i,k} \\ \vdots & & \vdots & & \vdots \\ \tilde{x}_{n,1} & \dots & \tilde{x}_{n,j} & \dots & \tilde{x}_{n,k} \end{pmatrix} \text{.} \] In matrix notation, it is possible to estimate \(\widetilde{\mathbf{X}}\) as: \[ \begin{aligned} \widetilde{\mathbf{X}} & {} = \mathbf{X} - \mathbf{J}_{n,1} \bar{\mathbf{x}}^{\top} = \\ & = \mathbf{X} - \mathbf{J}_{n,1} \left( \frac{1}{n} \mathbf{X}^{\top} \mathbf{J}_{n,1} \right)^{\top} = \\ & = \mathbf{X} - \frac{1}{n} \mathbf{J}_{n,1} \mathbf{J}_{1,n} \mathbf{X} = \\ & = \left(\mathbf{I}_{n} - \frac{1}{n} \mathbf{J}_{n} \right) \mathbf{X} = \\ & = \mathbf{A} \mathbf{X} \end{aligned} \] where \(\mathbf{I}_{n}\) is the identity matrix (Equation 31.3) and \(\mathbf{J}_{n}\) is the matrix of ones (Equation 31.2). \(\mathbf{A}\) is called the centering matrix and it is formally defined as: \[ \underset{n \times n }{\mathbf{A}} = \mathbf{I}_{n} - \frac{1}{n} \mathbf{J}_{n} \text{.} \tag{13.2}\]

Deviation matrix

# Identity matrix
I <- diag(1, n, n)
# Centering matrix
A <- I - (t(J_1n) %*% J_1n)/n
# Deviation matrix
X_tilde <- A %*% X

\[ \underset{10 \times 3 }{\widetilde{\mathbf{X}}} = \begin{array}{c|rrr} i & \tilde{\mathbf{x}}_1 & \tilde{\mathbf{x}}_2 & \tilde{\mathbf{x}}_3 \\ \hline 1 & 0.299 & -1.098 & 2.194 \\ 2 & -0.7343 & 0.1767 & 0.7968 \\ 3 & -0.145 & 1.026 & 0.3308 \\ 4 & -2.492 & 0.4359 & 0.1829 \\ 5 & 1.192 & 0.3104 & 0.8122 \\ 6 & 1.105 & -0.1681 & -2.803 \\ 7 & 0.1305 & -0.3842 & -2.075 \\ 8 & 0.4959 & 0.5991 & 0.1355 \\ 9 & 0.15 & -1.364 & -0.7304 \\ 10 & -0.001575 & 0.4662 & 1.156 \\ \end{array} \]

13.3 Variance-covariance matrix

The covariance for a matrix \(\mathbf{X}\) is also a matrix \(k \times k\) of the form: \[ \underset{k \times k }{\boldsymbol{\Sigma}} = \begin{pmatrix} \mathbb{V}\{\mathbf{X}_1\} & \dots & \mathbb{C}v\{\mathbf{X}_1, \mathbf{X}_j\} & \dots & \mathbb{C}v\{\mathbf{X}_1, \mathbf{X}_k\} \\ \vdots & & \vdots & & \vdots \\ \mathbb{C}v\{\mathbf{X}_j, \mathbf{X}_1\} & \dots & \mathbb{V}\{\mathbf{X}_j\} & \dots & \mathbb{C}v\{\mathbf{X}_j, \mathbf{X}_k\} \\ \vdots & & \vdots & & \vdots \\ \mathbb{C}v\{\mathbf{X}_k, \mathbf{X}_1\} & \dots & \mathbb{C}v\{\mathbf{X}_k, \mathbf{X}_j\} & \dots & \mathbb{V}\{\mathbf{X}_k\} \\ \end{pmatrix} \tag{13.3}\] Given a matrix \(\mathbf{X}\), the variance-covariance matrix in matrix notation can be estimated as \[ \begin{aligned} \boldsymbol{\Sigma} & {} = \frac{1}{n}\widetilde{\mathbf{X}}^{\top} \widetilde{\mathbf{X}} = \\ & = \frac{1}{n} \left( \textbf{A} \mathbf{X} \right)^{\top} \textbf{A} \mathbf{X} = \\ & = \frac{1}{n} \mathbf{X}^{\top} \textbf{A}^{\top} \textbf{A} \mathbf{X} \end{aligned} \tag{13.4}\] where \(\mathbf{A}\) is the centering matrix (Equation 13.2). The variance-covariance matrix is:

Square, i.e. \(k \times k\), and symmetric.
Positive semi-definite.
It has the variances on the diagonal. Hence the trace (Equation 31.8) is \(\text{trace}(\mathbb{C}v\{\mathbf{X}\}) = \sum_{j = 1}^{k} \mathbb{V}\{\mathbf{X}_j\}\).

Covariance matrix

# covariance matrix
cv_x <- (t(A %*% X) %*% A %*% X) / (n-1)
# In terms of deviation matrix 
# t(X_tilde) %*% X_tilde / (n-1)
# Standard computation
# cov(X)

\[ \underset{3 \times 3 }{\boldsymbol{\Sigma}} = \begin{pmatrix} 1.087 & -0.163 & -0.319 \\ -0.163 & 0.576 & 0.143 \\ -0.319 & 0.143 & 2.25 \\ \end{pmatrix} \]

13.4 Standardized variables

In order to remove the effect of the unit of measure in the different variables, it is possible to work with the matrix of standardized variables, i.e. \[ \underset{n \times k}{\mathbf{Z}} = \begin{pmatrix} z_{1,1} & \dots & z_{1,j} & \dots & z_{1,k} \\ \vdots & & \vdots & & \vdots \\ z_{i,1} & \dots & z_{i,j} & \dots & z_{i,k} \\ \vdots & & \vdots & & \vdots \\ z_{n,1} & \dots & z_{n,j} & \dots & z_{n,k} \\ \end{pmatrix} \text{.} \] where each element \(z_{i,j}\) is defined as: \[ z_{i,j} = \frac{\tilde{x}_{i,j}}{\sqrt{\mathbb{V}\{\mathbf{X}_j\}}} = \frac{x_{i,j} - \bar{x}_j}{\sqrt{\mathbb{V}\{\mathbf{X}_j\}}} \text{.} \] In matrix notation, \(\mathbf{Z}\) can be rewritten as: \[ \mathbf{Z} = \widetilde{\mathbf{X}} \cdot \mathbf{D}^{-\frac{1}{2}} \text{.} \] where the matrix \(\mathbf{D}\) is defined as: \[ \underset{k \times k}{\mathbf{D}} = \boldsymbol{\Sigma} \cdot \mathbf{I}_{k} = \begin{pmatrix} \mathbb{V}\{\mathbf{X}_1\} & \dots & 0 & \dots & 0 \\ \vdots & & \vdots & & \vdots \\ 0 & \dots & \mathbb{V}\{\mathbf{X}_j\} & \dots & 0 \\ \vdots & & \vdots & & \vdots \\ 0 & \dots & 0 & \dots & \mathbb{V}\{\mathbf{X}_n\} \\ \end{pmatrix} \text{.} \tag{13.5}\] where \(\mathbf{I}_{k}\) is the identity matrix (Equation 31.3). The standardized variables have mean equal to zero and unit variance. Moreover, the numbers no longer depend on the unit of measure of the variables.

Matrix D

# Variances 
v_x <- c(mean((X[,1] - x_bar[1])^2),
         mean((X[,2] - x_bar[2])^2), 
         mean((X[,3] - x_bar[3])^2)) * n/(n-1)
D <- (1/sqrt(v_x)) * diag(1, k, k)

\[ \underset{3 \times 3 }{\mathbf{D}} = \begin{pmatrix} 1.087 & 0 & 0 \\ 0 & 0.576 & 0 \\ 0 & 0 & 2.25 \\ \end{pmatrix} \]

Standardized variables

# Standardized variables
Z <- X_tilde %*% D
# Standard computation
# scale(X)

\[ \underset{10 \times 3}{\mathbf{Z}} = \begin{array}{c|rrr} i & \mathbf{z}_1 & \mathbf{z}_2 & \mathbf{z}_3 \\ \hline 1 & 0.2868 & -1.446 & 1.461 \\ 2 & -0.7042 & 0.2327 & 0.5305 \\ 3 & -0.1391 & 1.352 & 0.2203 \\ 4 & -2.39 & 0.574 & 0.1218 \\ 5 & 1.143 & 0.4087 & 0.5408 \\ 6 & 1.06 & -0.2213 & -1.866 \\ 7 & 0.1251 & -0.506 & -1.382 \\ 8 & 0.4755 & 0.7889 & 0.0902 \\ 9 & 0.1439 & -1.796 & -0.4863 \\ 10 & -0.00151 & 0.6139 & 0.7696 \\ \end{array} \]

13.5 Correlations matrix

The correlation is a statistic that measures the linear dependence between two or more variables. The elements of the correlation matrix are: \[ \underset{k \times k}{\boldsymbol{\rho}} = \begin{pmatrix} 1 & \dots & \mathbb{C}r\{\mathbf{X}_1, \mathbf{X}_j\} & \dots & \mathbb{C}r\{\mathbf{X}_1, \mathbf{X}_k\} \\ \vdots & & \vdots & & \vdots \\ \mathbb{C}r\{\mathbf{X}_j, \mathbf{X}_1\} & \dots & 1 & \dots & \mathbb{C}r\{\mathbf{X}_j, \mathbf{X}_k\} \\ \vdots & & \vdots & & \vdots \\ \mathbb{C}r\{\mathbf{X}_k, \mathbf{X}_1\} & \dots & \mathbb{C}r\{\mathbf{X}_k, \mathbf{X}_j\} & \dots & 1 \\ \end{pmatrix} \text{.} \] where each element is the correlation (Equation 5.11) between the random variables \(\mathbf{X}_j\) and \(\mathbf{X}_i\).

In matrix notation, the correlation of a matrix \(\mathbf{X}\) can be estimated from the deviation matrix or from the standardized one, i.e. \[ \begin{aligned} \boldsymbol{\rho} & {} = \mathbf{D}^{-\frac{1}{2}} \boldsymbol{\Sigma} \, \mathbf{D}^{-\frac{1}{2}} = \\ & = \frac{1}{n} \mathbf{D}^{-\frac{1}{2}} \widetilde{\mathbf{X}}^{\top} \widetilde{\mathbf{X}} \, \mathbf{D}^{-\frac{1}{2}} = \\ & = \frac{1}{n} \mathbf{Z}^{\top} \mathbf{Z} \end{aligned} \] where \(\mathbf{D}\) is defined as in Equation 13.5. The correlation matrix is:

Square, i.e. \(k \times k\), and symmetric.
Positive semi-definite.
Trace: (Equation 31.8), i.e. \(\text{trace}(\boldsymbol{\varrho}) = \sum_{j = 1}^{k} 1 = k\).

Correlation matrix

# Correlation matrix 
cr_x <- t(Z) %*% Z / (n-1)

# In terms of covariance matrix 
# t(D) %*% cv_x %*% D 

# In terms of deviation matrix 
# t(X_tilde %*% D) %*% X_tilde %*% D / (n-1)

# Standard computation
# cor(X)

\[ \underset{3 \times 3}{\boldsymbol{\rho}} = \begin{pmatrix} 1.0 & -0.205 & -0.204 \\ -0.205 & 1.0 & 0.125 \\ -0.204 & 0.125 & 1.0 \\ \end{pmatrix} \]