26 Stationarity tests

Setup

library(dplyr)
library(knitr)
library(ggplot2)

26.1 Dickey–Fuller test

The Dickey–Fuller test tests the null hypothesis that a unit root is present in an auto regressive (AR) model. The alternative hypothesis is different depending on which version of the test is used, usually is “stationary” or “trend-stationary”. Let’s consider an AR(1) model, i.e. $\begin{matrix} (26.1) & x_{t} = μ + δ t + ϕ_{1} x_{t - 1} + u_{t}, \end{matrix}$ or equivalently adding and subtracting $x_{t - 1}$ $\begin{matrix} (26.2) & Δ x_{t} = μ + δ t + (1 - ϕ_{1}) x_{t - 1} + u_{t} . \end{matrix}$ The hypothesis of the Dickey–Fuller test are: $\begin{aligned} H_{0} : ϕ_{1} = 1 (non stationarity) \\ H_{1} : ϕ_{1} < 1 (stationarity) \end{aligned}$ The Dickey–Fuller statistic (DF) is computed as: $DF = \frac{1 - ϕ_{1}}{S d {1 - ϕ_{1}}}$ However, since the test is done over the residual term rather than raw data, it is not possible to use the t-distribution to provide critical values. Therefore, the statistic $D F$ has a specific distribution.

26.2 Augmented Dickey–Fuller test

The augmented Dickey–Fuller is a more general version of the Dickey–Fuller test for a general AR(p) model, i.e. $Δ x_{t} = μ + δ t + γ x_{t - 1} + \sum_{i = 1}^{p} ϕ_{i} Δ x_{t - i}$ The hypothesis of the augmented Dickey–Fuller test are: $\begin{aligned} H_{0} : γ = 0 (non stationarity) \\ H_{1} : γ < 0 (stationarity) \end{aligned}$ The augmented Dickey–Fuller statistic (ADF) is computed as: $ADF = \frac{γ}{S d {γ}}$ As in the simpler case, the critical values are computed using a specific table for the Dickey–Fuller test.

26.3 Kolmogorov-Smirnov test

The Kolmogorov–Smirnov two-sample test (KS) can be used to test whether two samples came from the same distribution. Let’s define the empirical distribution function $F_{n}$ of $n$ -independent and identically distributed ordered observations $X_{(i)}$ as $F_{n} (x) = \frac{1}{n} \sum_{i = 1}^{n} 1_{(- \infty, x]} (X_{(i)}) .$ The KS statistic quantifies a distance between the empirical distribution function of the sample and the cumulative distribution functions of two random samples. The distribution of the KS statistic under the null hypothesis assumes that the samples are drawn from the same distribution, i.e. $\begin{aligned} H_{0} : X is stationary \\ H_{1} : X is not stationary \end{aligned}$ The statistic test for two samples with dimension $n_{1}$ and $n_{2}$ is defined as: ${KS}_{n_{1}, n_{2}} = sup_{\forall x} | F_{n_{1}} (x) - F_{n_{2}} (x) |,$ and for large samples $H_{0}$ is rejected with confidence level $1 - α$ if: ${KS}_{n_{1}, n_{2}} > \sqrt{- \frac{1}{2 n_{2}} \ln (\frac{α}{2}) (1 + \frac{n_{2}}{n_{1}})} .$ Hence, since the statistic is always greater of equal to zero, with a given statistic ${KS}_{n_{1}, n_{2}}$ the p-value with confidence level $α = 2 P (X > {KS}_{n_{1}, n_{2}})$ reads: $P (X > {KS}_{n_{1}, n_{2}}) = \exp (- \frac{2 n_{2}}{1 + \frac{n 1}{n 2}} {KS}_{n_{1}, n_{2}}^{2})$

KS test for time series

To apply the test in a time series settings, use a random index to split the original series in two sub-series. Then the KS can be applied as usual.

26.3.1 Examples

Check for stationarity

Example 26.1 Let’s consider 500 simulated observations of the random variable $X$ drown from a population distributed as $X \sim N (0.4, 1)$ . Then, considering it as a time series, let’s sample a random index to split the series in a point. Finally, as shown in Table 26.1 the null hypothesis, i.e. the two samples come from the same distribution, is not reject with the confidence level $α = 5 %$ .

KS-test on a stationary time series

# ================ Setups ================
set.seed(5) # random seed
ci <- 0.05  # confidence level (alpha)
n <- 500    # number of simulations
x <- rnorm(n, 0.4, 1) # stationary series 
# ========================================
# Random time for splitting 
t_split <- sample(n, 1)
# Split the time series
x1 <- x[1:t_split]
x2 <- x[(t_split+1):n]
# Number of elements for each sub-series 
n1 <- length(x1)
n2 <- length(x2)
# Grid of values for computing KS-statistic
x_min <- quantile(x, 0.015)
x_max <- quantile(x, 0.985)
grid <- seq(x_min, x_max, length.out = 200)
# Empirical cdfs 
cdf_n1 <- ecdf(x1)
cdf_n2 <- ecdf(x2)
# KS-statistic 
ks_stat <- max(abs(cdf_n1(grid) - cdf_n2(grid)))
# Rejection level with probability alpha 
rejection_lev <- sqrt(-0.5*log(ci/2))*sqrt((n1+n2)/(n1*n2))
# P-value
p.value <- exp(- (2 * n2) / (1 + n1/n2) * ks_stat^2)

KS-test plot

y_breaks <- seq(0, 1, 0.2)
y_labels <- paste0(format(y_breaks*100, digits = 2), "%")
grid_max <- grid[which.max(abs(cdf_n1(grid) - cdf_n2(grid)))]
ggplot()+
  geom_ribbon(aes(grid, ymax = cdf_n1(grid), ymin = cdf_n2(grid)), 
              alpha = 0.5, fill = "green") +
  geom_line(aes(grid, cdf_n1(grid)))+
  geom_line(aes(grid, cdf_n2(grid)), color = "red")+
  geom_segment(aes(x = grid_max, xend = grid_max, y = cdf_n1(grid_max), yend = cdf_n2(grid_max)), 
               linetype = "solid", color = "magenta")+
  geom_point(aes(grid_max, cdf_n1(grid_max)), color = "magenta")+
  geom_point(aes(grid_max, cdf_n2(grid_max)), color = "magenta")+
  scale_y_continuous(breaks = y_breaks, labels = y_labels)+
  labs(x = "x", y = "cdf")+
  theme_bw()

Figure 26.1: Two samples cdfs and KS-statistic (magenta) for a stationary time series.

KS-test (stationary)

kab <- tibble(
  t_split = t_split,
  ci = ci,
  n1 = n1,
  n2 = n2,
  KS = ks_stat,
  p.value = p.value,
  rejection_lev = rejection_lev,
  H0 = ifelse(KS > rejection_lev, "Rejected", "Non-Rejected")
)  %>%
  mutate_if(is.numeric, format, digits = 4, scientific = FALSE)
colnames(kab) <- c("$\\textbf{Index split}$","$\\alpha$", "$n_1$", "$n_2$", 
                   "$KS_{n_1, n_2}$", "p.value", "$\\textbf{Critical level}$", "$H_0$")
knitr::kable(kab, escape = FALSE)

Table 26.1: KS test for a stationary time series.

$Index split$	$α$	$n_{1}$	$n_{2}$	$K S_{n_{1}, n_{2}}$	p.value	$Critical level$	$H_{0}$
348	0.05	348	152	0.07093	0.6282	0.132	Non-Rejected

Check for non-stationarity

Example 26.2 Let’s consider 250 simulated observations of the random variable $X$ drown from a population distributed as $Y_{1, t} \sim N (0, 1)$ and the following 250 from $Y_{2, t} \sim N (0.3, 1)$ . Then the non-stationary series will have a structural break at the point 250 and the time series is given by: $X_{t} = {\begin{cases} Y_{1, t} t \leq 250 \\ Y_{2, t} t > 250 \end{cases}$ As in Example 26.2 let’s split the time series and apply the KS-test. In this case, as shown in Table 26.2 the null hypothesis, i.e. the two samples come from the same distribution, is reject with confidence level $α = 5 %$ .

KS-test on a non-stationary time series

# ============== Setups ==============
set.seed(2) # random seed
ci <- 0.05  # confidence level (alpha)
n <- 500    # number of simulations
# Simulated non-stationary sample 
x1 <- rnorm(n/2, 0, 1)
x2 <- rnorm(n/2, 0.3, 1)
x <- c(x1, x2)
# ====================================
# Random split of the time series
t_split <- sample(n, 1)
x1 <- x[1:t_split]
x2 <- x[(t_split+1):n]
# Number of elements for each sub sample 
n1 <- length(x1)
n2 <- length(x2)
# Grid of values for KS-statistic
grid <- seq(quantile(x, 0.015), quantile(x, 0.985), 0.01)
# Empiric cdfs 
cdf_1 <- ecdf(x1)
cdf_2 <- ecdf(x2)
# KS-statistic 
ks_stat <- max(abs(cdf_1(grid) - cdf_2(grid)))
# Rejection level  
rejection_lev <- sqrt(-0.5*log(ci/2))*sqrt((n1+n2)/(n1*n2))
# P-value
p.value <- exp(- (2 * n2) / (1 + n1/n2) * ks_stat^2)

KS-test plot

y_breaks <- seq(0, 1, 0.2)
y_labels <- paste0(format(y_breaks*100, digits = 2), "%")
grid_max <- grid[which.max(abs(cdf_1(grid) - cdf_2(grid)))]
ggplot()+
  geom_ribbon(aes(grid, ymax = cdf_1(grid), ymin = cdf_2(grid)), 
              alpha = 0.5, fill = "green") +
  geom_line(aes(grid, cdf_1(grid)))+
  geom_line(aes(grid, cdf_2(grid)), color = "red")+
  geom_segment(aes(x = grid_max, xend = grid_max, 
                   y = cdf_1(grid_max), yend = cdf_2(grid_max)), 
               linetype = "solid", color = "magenta")+
  geom_point(aes(grid_max, cdf_1(grid_max)), color = "magenta")+
  geom_point(aes(grid_max, cdf_2(grid_max)), color = "magenta")+
  scale_y_continuous(breaks = y_breaks, labels = y_labels)+
  labs(x = "x", y = "cdf")+
  theme_bw()

Figure 26.2: Two samples cdfs and KS-statistic (magenta) for a non-stationary time series.

KS-test (non-stationary)

kab <- dplyr::tibble(
  t_split = t_split,
  ci = ci,
  n1 = n1,
  n2 = n2,
  KS = ks_stat,
  p.value = p.value,
  rejection_lev = rejection_lev,
  H0 = ifelse(KS > rejection_lev, "Rejected", "Non-Rejected"))  %>%
  mutate_if(is.numeric, format, digits = 4, scientific = FALSE)
colnames(kab) <- c("$\\textbf{Index split}$","$\\alpha$", "$n_1$", "$n_2$", 
                   "$KS_{n_1, n_2}$", "p.value", "$\\textbf{Critical level}$", "$H_0$")
knitr::kable(kab, escape = FALSE)

Table 26.2: KS test for a non-stationary time series.

$Index split$	$α$	$n_{1}$	$n_{2}$	$K S_{n_{1}, n_{2}}$	p.value	$Critical level$	$H_{0}$
166	0.05	166	334	0.1643	0.000005831	0.129	Rejected