18 Time series

A stochastic process is a collection of random variables ${X_{t}}_{t \in T}$ defined on a probability space $(Ω, B, P)$ and assuming values in $R^{k}$ with $k \geq 1$ . The index set $T$ is usually the half-like $[0, \infty)$ , but can be also an interval $[a, b]$ or a subset of $R^{k}$ .

While a stochastic process has a crystal clear, mathematical definition. A time series is a less precise notion, and people use time series to refer to two related but different objects. A time series can be seen as a stochastic process indexed by integers or some regular, incremental unit of time that can in a sense by mapped to integers (eg. minutely, hourly, daily, monthly data). A time series can be understood as a collection of time-value–data-point pairs. while a stochastic process is the mathematical description of a distribution of time series. Some time series are a realization of stochastic processes or one can use a stochastic process as a model to generate a time series.

Definition 18.1 ( $Filtration$ )
Let $(Ω, B, P)$ be a probability space and let $T$ be an index set. If for every $t \in T$ , one have that $F_{t}$ is a sub- $σ$ -algebra of $B$ and for every $k \leq t$ holds $F_{k} \subseteq F_{t}$ , then the family of sub- $σ$ -algebras denoted with ${F_{t}}_{t \in T},$ is called a filtration.

For example, considering a stochastic process ${X_{n}}_{n \in N}$ and consider the sequence of sub- $σ$ -algebras generated by $X_{1}, X_{2}, \dots, X_{n}$ , then $F_{n}$ is a $σ$ -algebra and ${F_{n}}_{n \in N}$ is a filtration, also called natural filtration, with respect to ${X_{n}}_{n \in N}$ . Formally, $F_{n}$ is the smallest $σ$ -algebra containing all events observable up to the index $n$ , i.e. $\begin{aligned} F_{0} = σ (X_{0}) \\ F_{1} = F_{0} \cup σ (X_{1}) = σ (X_{0}, X_{1}) \\ F_{2} = F_{1} \cup σ (X_{2}) = σ (X_{0}, X_{1}, X_{2}) \\ ⋮ \\ F_{n} = F_{n - 1} \cup σ (X_{t - 1}) = σ (X_{0}, X_{1}, X_{2}, \dots, X_{n}) \end{aligned}$

18.1 Stationarity

Definition 18.2 ( $Strongly stationary$ )
A stochastic process ${X_{t}}_{t \in T}$ is said to be strongly stationary if and only if for all set of index ${t_{1}, t_{2}, \dots, t_{n}} \in T$ and for every $h \neq 0$ $P (X_{t_{1}}, X_{t_{2}}, \dots, X_{t_{n}}) = P (X_{t_{1 + h}}, X_{t_{2 + h}}, \dots, X_{t_{n + h}}) .$ meaning that the joint distribution of an arbitrary number of random variables $X_{t_{1}}, X_{t_{2}}, \dots, X_{t_{n}}$ do not change shifting the process, upward or downward, by a step $h$ .

Definition 18.3 ( $Weakly stationary$ )
A stochastic process ${X_{t}}_{t \in T}$ is said to be weakly stationary (or covariance stationary) if and only if

$E {X_{t}} = μ$ and $| μ | < \infty$ for every $t \in T$ ;
$E {X_{t}^{2}} < \infty$ for every $t \in T$ ;
$C v {X_{t}, X_{t + h}} = γ (h)$ and $| γ (h) | < \infty$ for every $t \in T$ and $h \neq 0$ .

Hence, the covariance $γ (h)$ of a weakly stationary process does not depend on time $t$ , but only on the temporal lag $h$ between two observations.

Strong does not imply weakly and viceversa

In general if a process is strong stationary (Definition 18.2) does not implies that it is also weakly stationary. For example, an independent and identically distributed Cauchy process is strongly stationary, but since its expectation and variance are not finite the process is not weakly stationary.

18.2 Notable processes

Definition 18.4 ( $IID process$ )
A time series, ${X_{t}}_{t \in T}$ where each $X_{t}$ is independent from the others and all $X_{t}$ has the same distribution for all $t$ is called independent and identically distributed process (IID). Such kind of process, usually denoted as $X_{t} \sim IID (0, σ^{2})$ , is strongly stationary (Definition 18.2). Moreover, if the mean and variance are finite, the covariance is zero and the process is also weakly stationary (Definition 18.3), i.e. $γ_{t} (h) = C v {X_{t}, X_{t + h}} = E {X_{t} X_{t + h}} = E {X_{t}} E {X_{t + h}} = 0 .$

Definition 18.5 ( $White noise$ )
A stochastic process ${X_{t}}_{t \in T}$ , commonly denoted as $\begin{matrix} (18.1) & X_{t} \sim WN (0, σ^{2}) . \end{matrix}$ is called White Noise if satisfies the following properties:

The expectation is equal to zero, i.e. $E {X_{t}} = 0$ for all $t \in T$ .
The variance is finite and constant for all $t \in T$ , i.e. $V {X_{t}} = σ^{2} < \infty$ .
The process is uncorrelated over time for all $t \neq k$ , i.e. $C v {X_{t}, X_{k}} = 0$ .

A White Noise process is weakly stationary (Definition 18.3). In fact, the autocovariance function of the process depends on the lag, but not on time, i.e. it is equal to the variance for $t = s$ and is zero otherwise. This process is more general than an IID process (Definition 18.4), since it does not requires the stochastic independence of the time series for all $t$ .

18.2.1 Martingales

Definition 18.6 ( $Martingale$ ) Let’s consider a probability space $(Ω, B, P)$ and stochastic process ${M_{t}}_{t \in T}$ . Then, given a filtration $F$ , namely ${F_{t}}_{t \in T}$ , the stochastic process $M_{t}$ is a martingale with respect to the filtration ${F_{t}}_{t \in T}$ if

$M_{t}$ is adapted to $F_{t}$ in the sense that $M_{t}$ in included in the information contained in $F_{t}$ , i.e. $M_{t}$ is $F_{t}$ -measurable.
For any $0 \leq k < t$ $\begin{matrix} (18.2) & E {M_{t} ∣ F_{k}} = M_{k} . \end{matrix}$

Definition 18.7 ( $Martingale difference sequence$ )
Let’s consider a probability space $(Ω, B, P)$ and stochastic process ${D_{t}}_{t \in T}$ . Then, given a filtration $F$ , namely ${F_{t}}_{t \in T}$ , the stochastic process $D_{t}$ is a martingale difference sequence (MDS) with respect to the filtration ${F_{t}}_{t \in T}$ if $D_{t}$ is $F_{t}$ -measurable and for any $0 \leq k < t$ $\begin{matrix} (18.3) & E {D_{t} ∣ F_{k}} = 0 . \end{matrix}$ This implies that $D_{t}$ is a mean-zero process uncorrelated with any information contained $F_{t - 1}$ . The definition can be extended to a case where the filtration $F_{t - 1}$ includes also other processes $X_{t}$ . In this case, $D_{t}$ is said to be an MDS conditionally to $X_{t}$ if the same condition in Equation 18.3 holds.

Super and sub-martingales

A stochastic process ${X_{t}}_{t \in T}$ is said to be a sub-martingale if instead of Equation 18.2 we have that for any $0 \leq k < t$ , $\begin{matrix} (18.4) & E {X_{t} ∣ F_{k}} \geq X_{k} . \end{matrix}$ On the other hand it is said to be a super-martingale if $\begin{matrix} (18.5) & E {X_{t} ∣ B_{k}} \leq X_{k} . \end{matrix}$ From the above definitions it follows that to be a martingale (or an MDS) a stochastic process must be at the same time a super and sub-martingale.

Simulate Martingales

set.seed(1)
# Number of observations
N <- 100
# Number of simulations
j_bar <- 20
# Predictable process
A_t <- rep(0.2, N)
X_sub <- X_mar <- X_sup <- list()
for(j in 1:j_bar){
  # Martingale 
  M_t <- rnorm(N, 0, 1)
  # Sub-martingale
  X_sub[[j]] <- dplyr::tibble(j = j, t = 1:N, X = cumsum(A_t + M_t))
  # Martingale 
  X_mar[[j]] <- dplyr::tibble(j = j, t = 1:N, X = cumsum(M_t))
  # Super-martingale 
  X_sup[[j]] <- dplyr::tibble(j = j, t = 1:N, X = cumsum(-A_t + M_t))
}

Figure 18.1: Simulation of a sub-martingale, martingale and super-martingale with expected value (red).

The concept of martingales is connected to the concept of predictability of a stochastic process.

Definition 18.8 ( $Predictable process$ ) Let’s consider a probability space $(Ω, B, P)$ and stochastic process ${A_{t}}_{t \in T}$ . Then, let’s define a sequence of sub- $σ$ fields of $B$ , namely ${F_{t}}_{t \in T}$ . Then, the stochastic process $A_{t}$ is predictable if

$A_{0} \in F_{0}$ ;
For any $t \geq 0$ we have that $A_{t + 1} \in F_{t}$ .

Then, we call the process predictable and increasing if $0 = A_{0} \leq A_{1} \leq A_{2} \leq \dots$ .

Theorem 18.1 ( $Predictable process$ ) Any sub-martingale ${X_{t}}_{t \in T}$ , can be written in a unique way as the sum of a martingale ${M_{t}}_{t \in T}$ and an increasing process ${A_{t}}_{t \in T}$ , i.e. $X_{t} = M_{t} + A_{t} .$

18.3 Lag operator

The lag operator $L$ is a function that allows to translate a time series in time. In general, the lag operator associate at $y_{t}$ it’s lagged value $y_{t - 1}$ , i.e. $\begin{matrix} (18.6) & L (y_{t}) = y_{t - 1} . \end{matrix}$ More formally, $L$ is the operator that takes one whole time series and produces another; the second time series is the same as the first, but moved backwards or forward one point in time. From the definition, we list some properties related to the Lag operator, i.e.

Backward $L^{k} (y_{t}) = y_{t - k}$ .
Forward $L^{- k} (y_{t} = y_{t + k}$ .
$L (a y_{t} + b x_{t}) = a y_{t - 1} + b x_{t - 1}$ .

18.3.1 Polynomial of Lag operator

Given a time series $y_{t}$ , it is possible to define polynomials of the Lag operator, i.e. $ϕ (L) y_{t} = y_{t} + ϕ_{1} y_{t - 1} + \dots + ϕ_{k} y_{t - p} = \sum_{i = 0}^{p} ϕ_{i} y_{t - i} .$ where in general $\begin{matrix} (18.7) & ϕ (L) = 1 + ϕ_{1} L + ϕ_{2} L^{2} + \dots + ϕ_{p} L^{p} = \sum_{j = 0}^{p} ϕ_{j} L^{j} . \end{matrix}$

For the polynomial $ϕ (L)$ holds the factorization $ϕ (L) = (1 - \frac{1}{z_{1}} L) (1 - \frac{1}{z_{2}} L) \dots (1 - \frac{1}{z_{k}} L) = \prod_{i = 1}^{k} (1 - \frac{1}{z_{i}} L),$ where $z_{1}, \dots, z_{k}$ are the complex solutions of the characteristic equation, i.e. $ϕ (z) = 1 + ϕ_{1} z + ϕ_{2} z^{2} + \dots + ϕ_{k} z^{k} = 0 .$ Hence the factorization holds true if and only if: $| z_{i} | > 1 \forall i ⟺ \frac{1}{| z_{i} |} < 1 .$ In other words, the modulus of the solutions must outside the unit circle, otherwise the geometric series is not convergent and the factorization do not holds true anymore. The factorization of the lag polynomial allows us to define its inverse, i.e. $ϕ^{- 1} (L) = \prod_{i = 1}^{p} {(1 - \frac{1}{z_{i}} L)}^{- 1},$ In fact, the inverse of the $i$ -th term can be expressed with a Taylor expansion as infinite sum if and only if $| ϕ_{i} | < 1$ , i.e. ${(1 - ϕ_{i} L)}^{- 1} = 1 + ϕ_{i} L + (ϕ_{i} L)^{2} + \dots = \sum_{j = 0}^{\infty} ϕ_{i}^{j} L^{j} ⟺ | ϕ_{i} | < 1 .$ that is equivalent to $| z_{i} | > 1$ for all $i$ since $ϕ_{i} = \frac{1}{z_{i}}$ .

AR(1) and geometric series

For example, let’s consider an Autoregressive process of order 1, i.e. $y_{t} = ϕ_{1} y_{t - 1} + e_{t} ⟺ ϕ (L) y_{t} = e_{t} ⟺ y_{t} = ϕ^{- 1} (L) e_{t}$ In fact, $\begin{aligned} ϕ (L) y_{t} & = y_{t} - ϕ_{1} y_{t - 1} = \\ = y_{t} - ϕ_{1} y_{t} L = \\ = y_{t} (1 - ϕ_{1} L) \end{aligned}$ Considering such polynomial, its inverse polynomial $ϕ (L)^{- 1}$ , defined such that $ϕ (L) ϕ^{- 1} (L) = 1$ , is defined as geometric series, i.e. $ϕ^{- 1} (L) = 1 + ϕ_{1} L + (ϕ_{1} L)^{2} + \dots = \sum_{j = 0}^{\infty} ϕ_{1}^{j} L^{j} = \frac{1}{1 - ϕ_{1} L} ⟺ | ϕ_{1} | < 1,$ that converges if and only if $| ϕ_{1} | < 1$ . Moreover, if $| ϕ_{1} | < 1$ it is possible to prove that $ϕ^{- 1} (L)$ is indeed the inverse polynomial of $ϕ (L)$ , in fact: $\begin{aligned} ϕ (L) ϕ^{- 1} (L) & = (1 - ϕ_{1} L) \cdot \sum_{j = 0}^{\infty} (ϕ_{1} L)^{j} = \\ = \sum_{j = 0}^{\infty} (ϕ_{1} L)^{j} - ϕ_{1} L \sum_{j = 0}^{\infty} (ϕ_{1} L)^{j} = \\ = \sum_{j = 0}^{\infty} (ϕ_{1} L)^{j} - \sum_{j = 0}^{\infty} (ϕ_{1} L)^{j + 1} = \\ = \sum_{j = 0}^{\infty} (ϕ_{1} L)^{j} - \sum_{j = 0}^{\infty} (ϕ_{1} L)^{j} + 1 = 1 \end{aligned}$ Therefore, the process $y_{t}$ can be equivalently expressed as: $y_{t} = ϕ^{- 1} (L) e_{t} = \sum_{j = 0}^{\infty} ϕ_{1}^{j} e_{t - j}$ The factorization of any polynomial of the form of $ϕ (L)$ is connected to the convergence of the following geometric series, i.e. $\sum_{j = 0}^{\infty} ϕ^{j} = \frac{1}{1 - ϕ} ⟺ | ϕ | < 1 .$

Convergent series (I)

# *************************************************
#                      Inputs 
# *************************************************
a <- c(0.7, -0.7)
min_i <- 0
max_i <- 20
i <- seq(min_i, max_i - 1, 1)
y_breaks <- seq(min_i, max_i, 2)
x_labels <- quantile(i, 0.85)
# *************************************************
# Convergent series 0 < a < 1
series1 <- cumsum(a[1]^i)
limit1 <- 1/(1 - a[1])
# Convergent series -1 < a < 0
series2 <- cumsum(a[2]^i)
limit2 <- 1/(1 - a[2])

Another important series that is convergent only if and only if $| a | < 1$ , i.e. $\sum_{i = 0}^{\infty} a^{2 i} = \frac{1}{1 - a^{2}} ⟺ | a | < 1 .$ Due to the square, in this case we do not distinguish between $0 < a < 1$ and $- 1 < a < 0$ since they lead to the same result.

Convergent series (II)

# Convergent series |a| < 1 
series1 <- cumsum(a[1]^(i*2))
limit1 <- 1/(1 - a[1]^2)

Figure 18.3: Convergent series for AR(1) parameter (II).