18  Time series

A stochastic process is a collection of random variables {Xt}tT defined on a probability space (Ω,B,P) and assuming values in Rk with k1. The index set T is usually the half-like [0,), but can be also an interval [a,b] or a subset of Rk.

While a stochastic process has a crystal clear, mathematical definition. A time series is a less precise notion, and people use time series to refer to two related but different objects. A time series can be seen as a stochastic process indexed by integers or some regular, incremental unit of time that can in a sense by mapped to integers (eg. minutely, hourly, daily, monthly data). A time series can be understood as a collection of time-value–data-point pairs. while a stochastic process is the mathematical description of a distribution of time series. Some time series are a realization of stochastic processes or one can use a stochastic process as a model to generate a time series.

Definition 18.1 (Filtration)
Let (Ω,B,P) be a probability space and let T be an index set. If for every tT, one have that Ft is a sub-σ-algebra of B and for every kt holds FkFt, then the family of sub-σ-algebras denoted with {Ft}tT, is called a filtration.

For example, considering a stochastic process {Xn}nN and consider the sequence of sub-σ-algebras generated by X1,X2,,Xn, then Fn is a σ-algebra and {Fn}nN is a filtration, also called natural filtration, with respect to {Xn}nN. Formally, Fn is the smallest σ-algebra containing all events observable up to the index n, i.e.  F0=σ(X0)F1=F0σ(X1)=σ(X0,X1)F2=F1σ(X2)=σ(X0,X1,X2)Fn=Fn1σ(Xt1)=σ(X0,X1,X2,,Xn)

18.1 Stationarity

Definition 18.2 (Strongly stationary)
A stochastic process {Xt}tT is said to be strongly stationary if and only if for all set of index {t1,t2,,tn}T and for every h0 P(Xt1,Xt2,,Xtn)=P(Xt1+h,Xt2+h,,Xtn+h). meaning that the joint distribution of an arbitrary number of random variables Xt1,Xt2,,Xtn do not change shifting the process, upward or downward, by a step h.

Definition 18.3 (Weakly stationary)
A stochastic process {Xt}tT is said to be weakly stationary (or covariance stationary) if and only if

  1. E{Xt}=μ and |μ|< for every tT;
  2. E{Xt2}< for every tT;
  3. Cv{Xt,Xt+h}=γ(h) and |γ(h)|< for every tT and h0.

Hence, the covariance γ(h) of a weakly stationary process does not depend on time t, but only on the temporal lag h between two observations.

Strong does not imply weakly and viceversa

In general if a process is strong stationary () does not implies that it is also weakly stationary. For example, an independent and identically distributed Cauchy process is strongly stationary, but since its expectation and variance are not finite the process is not weakly stationary.

18.2 Notable processes

Definition 18.4 (IID process)
A time series, {Xt}tT where each Xt is independent from the others and all Xt has the same distribution for all t is called independent and identically distributed process (IID). Such kind of process, usually denoted as XtIID(0,σ2), is strongly stationary (). Moreover, if the mean and variance are finite, the covariance is zero and the process is also weakly stationary (), i.e.  γt(h)=Cv{Xt,Xt+h}=E{XtXt+h}=E{Xt}E{Xt+h}=0.

Definition 18.5 (White noise)
A stochastic process XttT, commonly denoted as (18.1)XtWN(0,σ2). is called White Noise if satisfies the following properties:

  1. The expectation is equal to zero, i.e. E{Xt}=0 for all tT.
  2. The variance is finite and constant for all tT, i.e. V{Xt}=σ2<.
  3. The process is uncorrelated over time for all tk, i.e. Cv{Xt,Xk}=0.

A White Noise process is weakly stationary (). In fact, the autocovariance function of the process depends on the lag, but not on time, i.e. it is equal to the variance for t=s and is zero otherwise. This process is more general than an IID process (), since it does not requires the stochastic independence of the time series for all t.

18.2.1 Martingales

Definition 18.6 (Martingale) Let’s consider a probability space (Ω,B,P) and stochastic process {Mt}tT. Then, given a filtration F, namely {Ft}tT, the stochastic process Mt is a martingale with respect to the filtration {Ft}tT if

  1. Mt is adapted to Ft in the sense that Mt in included in the information contained in Ft, i.e. Mt is Ft-measurable.
  2. For any 0k<t (18.2)E{MtFk}=Mk.

Definition 18.7 (Martingale difference sequence)
Let’s consider a probability space (Ω,B,P) and stochastic process {Dt}tT. Then, given a filtration F, namely {Ft}tT, the stochastic process Dt is a martingale difference sequence (MDS) with respect to the filtration {Ft}tT if Dt is Ft-measurable and for any 0k<t (18.3)E{DtFk}=0. This implies that Dt is a mean-zero process uncorrelated with any information contained Ft1. The definition can be extended to a case where the filtration Ft1 includes also other processes Xt. In this case, Dt is said to be an MDS conditionally to Xt if the same condition in holds.

Super and sub-martingales

A stochastic process {Xt}tT is said to be a sub-martingale if instead of we have that for any 0k<t, (18.4)E{XtFk}Xk. On the other hand it is said to be a super-martingale if (18.5)E{XtBk}Xk. From the above definitions it follows that to be a martingale (or an MDS) a stochastic process must be at the same time a super and sub-martingale.

Simulate Martingales
set.seed(1)
# Number of observations
N <- 100
# Number of simulations
j_bar <- 20
# Predictable process
A_t <- rep(0.2, N)
X_sub <- X_mar <- X_sup <- list()
for(j in 1:j_bar){
  # Martingale 
  M_t <- rnorm(N, 0, 1)
  # Sub-martingale
  X_sub[[j]] <- dplyr::tibble(j = j, t = 1:N, X = cumsum(A_t + M_t))
  # Martingale 
  X_mar[[j]] <- dplyr::tibble(j = j, t = 1:N, X = cumsum(M_t))
  # Super-martingale 
  X_sup[[j]] <- dplyr::tibble(j = j, t = 1:N, X = cumsum(-A_t + M_t))
}
Figure 18.1: Simulation of a sub-martingale, martingale and super-martingale with expected value (red).

The concept of martingales is connected to the concept of predictability of a stochastic process.

Definition 18.8 (Predictable process) Let’s consider a probability space (Ω,B,P) and stochastic process {At}tT. Then, let’s define a sequence of sub-σ fields of B, namely {Ft}tT. Then, the stochastic process At is predictable if

  1. A0F0;
  2. For any t0 we have that At+1Ft.

Then, we call the process predictable and increasing if 0=A0A1A2.

Theorem 18.1 (Predictable process) Any sub-martingale {Xt}tT, can be written in a unique way as the sum of a martingale {Mt}tT and an increasing process {At}tT, i.e.  Xt=Mt+At.

18.3 Lag operator

The lag operator L is a function that allows to translate a time series in time. In general, the lag operator associate at yt it’s lagged value yt1, i.e.  (18.6)L(yt)=yt1. More formally, L is the operator that takes one whole time series and produces another; the second time series is the same as the first, but moved backwards or forward one point in time. From the definition, we list some properties related to the Lag operator, i.e. 

  1. Backward Lk(yt)=ytk.
  2. Forward Lk(yt=yt+k.
  3. L(ayt+bxt)=ayt1+bxt1.

18.3.1 Polynomial of Lag operator

Given a time series yt, it is possible to define polynomials of the Lag operator, i.e. ϕ(L)yt=yt+ϕ1yt1++ϕkytp=i=0pϕiyti. where in general (18.7)ϕ(L)=1+ϕ1L+ϕ2L2++ϕpLp=j=0pϕjLj.

For the polynomial ϕ(L) holds the factorization ϕ(L)=(11z1L)(11z2L)(11zkL)=i=1k(11ziL), where z1,,zk are the complex solutions of the characteristic equation, i.e.  ϕ(z)=1+ϕ1z+ϕ2z2++ϕkzk=0. Hence the factorization holds true if and only if: |zi|>1i1|zi|<1. In other words, the modulus of the solutions must outside the unit circle, otherwise the geometric series is not convergent and the factorization do not holds true anymore. The factorization of the lag polynomial allows us to define its inverse, i.e.  ϕ1(L)=i=1p(11ziL)1, In fact, the inverse of the i-th term can be expressed with a Taylor expansion as infinite sum if and only if |ϕi|<1, i.e.  (1ϕiL)1=1+ϕiL+(ϕiL)2+=j=0ϕijLj|ϕi|<1. that is equivalent to |zi|>1 for all i since ϕi=1zi.

For example, let’s consider an Autoregressive process of order 1, i.e. yt=ϕ1yt1+etϕ(L)yt=etyt=ϕ1(L)et In fact, ϕ(L)yt=ytϕ1yt1==ytϕ1ytL==yt(1ϕ1L) Considering such polynomial, its inverse polynomial ϕ(L)1, defined such that ϕ(L)ϕ1(L)=1, is defined as geometric series, i.e.  ϕ1(L)=1+ϕ1L+(ϕ1L)2+=j=0ϕ1jLj=11ϕ1L|ϕ1|<1, that converges if and only if |ϕ1|<1. Moreover, if |ϕ1|<1 it is possible to prove that ϕ1(L) is indeed the inverse polynomial of ϕ(L), in fact: ϕ(L)ϕ1(L)=(1ϕ1L)j=0(ϕ1L)j==j=0(ϕ1L)jϕ1Lj=0(ϕ1L)j==j=0(ϕ1L)jj=0(ϕ1L)j+1==j=0(ϕ1L)jj=0(ϕ1L)j+1=1 Therefore, the process yt can be equivalently expressed as: yt=ϕ1(L)et=j=0ϕ1jetj The factorization of any polynomial of the form of ϕ(L) is connected to the convergence of the following geometric series, i.e.  j=0ϕj=11ϕ|ϕ|<1.

Convergent series (I)
# *************************************************
#                      Inputs 
# *************************************************
a <- c(0.7, -0.7)
min_i <- 0
max_i <- 20
i <- seq(min_i, max_i - 1, 1)
y_breaks <- seq(min_i, max_i, 2)
x_labels <- quantile(i, 0.85)
# *************************************************
# Convergent series 0 < a < 1
series1 <- cumsum(a[1]^i)
limit1 <- 1/(1 - a[1])
# Convergent series -1 < a < 0
series2 <- cumsum(a[2]^i)
limit2 <- 1/(1 - a[2])
(a) 0<a<1.
(b) 1<a<0.
Figure 18.2: Convergent series for AR(1) parameter (I).

Another important series that is convergent only if and only if |a|<1, i.e.  i=0a2i=11a2|a|<1. Due to the square, in this case we do not distinguish between 0<a<1 and 1<a<0 since they lead to the same result.

Convergent series (II)
# Convergent series |a| < 1 
series1 <- cumsum(a[1]^(i*2))
limit1 <- 1/(1 - a[1]^2)
Figure 18.3: Convergent series for AR(1) parameter (II).