26  Stationarity tests

Setup
library(dplyr)
library(knitr)
library(ggplot2)

26.1 Dickey–Fuller test

The Dickey–Fuller test tests the null hypothesis that a unit root is present in an auto regressive (AR) model. The alternative hypothesis is different depending on which version of the test is used, usually is “stationary” or “trend-stationary”. Let’s consider an AR(1) model, i.e. (26.1)xt=μ+δt+ϕ1xt1+ut, or equivalently adding and subtracting xt1 (26.2)Δxt=μ+δt+(1ϕ1)xt1+ut. The hypothesis of the Dickey–Fuller test are: H0:ϕ1=1(non stationarity)H1:ϕ1<1(stationarity) The Dickey–Fuller statistic (DF) is computed as: DF=1ϕ1Sd{1ϕ1} However, since the test is done over the residual term rather than raw data, it is not possible to use the t-distribution to provide critical values. Therefore, the statistic DF has a specific distribution.

26.2 Augmented Dickey–Fuller test

The augmented Dickey–Fuller is a more general version of the Dickey–Fuller test for a general AR(p) model, i.e.  Δxt=μ+δt+γxt1+i=1pϕiΔxti The hypothesis of the augmented Dickey–Fuller test are: H0:γ=0(non stationarity)H1:γ<0(stationarity) The augmented Dickey–Fuller statistic (ADF) is computed as: ADF=γSd{γ} As in the simpler case, the critical values are computed using a specific table for the Dickey–Fuller test.

26.3 Kolmogorov-Smirnov test

The Kolmogorov–Smirnov two-sample test (KS) can be used to test whether two samples came from the same distribution. Let’s define the empirical distribution function Fn of n-independent and identically distributed ordered observations X(i) as Fn(x)=1ni=1n1(,x](X(i)). The KS statistic quantifies a distance between the empirical distribution function of the sample and the cumulative distribution functions of two random samples. The distribution of the KS statistic under the null hypothesis assumes that the samples are drawn from the same distribution, i.e.  H0:Xis stationaryH1:Xis not stationary The statistic test for two samples with dimension n1 and n2 is defined as: KSn1,n2=supx|Fn1(x)Fn2(x)|, and for large samples H0 is rejected with confidence level 1α if: KSn1,n2>12n2ln(α2)(1+n2n1). Hence, since the statistic is always greater of equal to zero, with a given statistic KSn1,n2 the p-value with confidence level α=2P(X>KSn1,n2) reads: P(X>KSn1,n2)=exp(2n21+n1n2KSn1,n22)

KS test for time series

To apply the test in a time series settings, use a random index to split the original series in two sub-series. Then the KS can be applied as usual.

26.3.1 Examples

Check for stationarity

Example 26.1 Let’s consider 500 simulated observations of the random variable X drown from a population distributed as XN(0.4,1). Then, considering it as a time series, let’s sample a random index to split the series in a point. Finally, as shown in the null hypothesis, i.e. the two samples come from the same distribution, is not reject with the confidence level α=5%.

KS-test on a stationary time series
# ================ Setups ================
set.seed(5) # random seed
ci <- 0.05  # confidence level (alpha)
n <- 500    # number of simulations
x <- rnorm(n, 0.4, 1) # stationary series 
# ========================================
# Random time for splitting 
t_split <- sample(n, 1)
# Split the time series
x1 <- x[1:t_split]
x2 <- x[(t_split+1):n]
# Number of elements for each sub-series 
n1 <- length(x1)
n2 <- length(x2)
# Grid of values for computing KS-statistic
x_min <- quantile(x, 0.015)
x_max <- quantile(x, 0.985)
grid <- seq(x_min, x_max, length.out = 200)
# Empirical cdfs 
cdf_n1 <- ecdf(x1)
cdf_n2 <- ecdf(x2)
# KS-statistic 
ks_stat <- max(abs(cdf_n1(grid) - cdf_n2(grid)))
# Rejection level with probability alpha 
rejection_lev <- sqrt(-0.5*log(ci/2))*sqrt((n1+n2)/(n1*n2))
# P-value
p.value <- exp(- (2 * n2) / (1 + n1/n2) * ks_stat^2)
KS-test plot
y_breaks <- seq(0, 1, 0.2)
y_labels <- paste0(format(y_breaks*100, digits = 2), "%")
grid_max <- grid[which.max(abs(cdf_n1(grid) - cdf_n2(grid)))]
ggplot()+
  geom_ribbon(aes(grid, ymax = cdf_n1(grid), ymin = cdf_n2(grid)), 
              alpha = 0.5, fill = "green") +
  geom_line(aes(grid, cdf_n1(grid)))+
  geom_line(aes(grid, cdf_n2(grid)), color = "red")+
  geom_segment(aes(x = grid_max, xend = grid_max, y = cdf_n1(grid_max), yend = cdf_n2(grid_max)), 
               linetype = "solid", color = "magenta")+
  geom_point(aes(grid_max, cdf_n1(grid_max)), color = "magenta")+
  geom_point(aes(grid_max, cdf_n2(grid_max)), color = "magenta")+
  scale_y_continuous(breaks = y_breaks, labels = y_labels)+
  labs(x = "x", y = "cdf")+
  theme_bw()
Figure 26.1: Two samples cdfs and KS-statistic (magenta) for a stationary time series.
KS-test (stationary)
kab <- tibble(
  t_split = t_split,
  ci = ci,
  n1 = n1,
  n2 = n2,
  KS = ks_stat,
  p.value = p.value,
  rejection_lev = rejection_lev,
  H0 = ifelse(KS > rejection_lev, "Rejected", "Non-Rejected")
)  %>%
  mutate_if(is.numeric, format, digits = 4, scientific = FALSE)
colnames(kab) <- c("$\\textbf{Index split}$","$\\alpha$", "$n_1$", "$n_2$", 
                   "$KS_{n_1, n_2}$", "p.value", "$\\textbf{Critical level}$", "$H_0$")
knitr::kable(kab, escape = FALSE)
Table 26.1: KS test for a stationary time series.
Index split α n1 n2 KSn1,n2 p.value Critical level H0
348 0.05 348 152 0.07093 0.6282 0.132 Non-Rejected
Check for non-stationarity

Example 26.2 Let’s consider 250 simulated observations of the random variable X drown from a population distributed as Y1,tN(0,1) and the following 250 from Y2,tN(0.3,1). Then the non-stationary series will have a structural break at the point 250 and the time series is given by: Xt={Y1,tt250Y2,tt>250 As in let’s split the time series and apply the KS-test. In this case, as shown in the null hypothesis, i.e. the two samples come from the same distribution, is reject with confidence level α=5%.

KS-test on a non-stationary time series
# ============== Setups ==============
set.seed(2) # random seed
ci <- 0.05  # confidence level (alpha)
n <- 500    # number of simulations
# Simulated non-stationary sample 
x1 <- rnorm(n/2, 0, 1)
x2 <- rnorm(n/2, 0.3, 1)
x <- c(x1, x2)
# ====================================
# Random split of the time series
t_split <- sample(n, 1)
x1 <- x[1:t_split]
x2 <- x[(t_split+1):n]
# Number of elements for each sub sample 
n1 <- length(x1)
n2 <- length(x2)
# Grid of values for KS-statistic
grid <- seq(quantile(x, 0.015), quantile(x, 0.985), 0.01)
# Empiric cdfs 
cdf_1 <- ecdf(x1)
cdf_2 <- ecdf(x2)
# KS-statistic 
ks_stat <- max(abs(cdf_1(grid) - cdf_2(grid)))
# Rejection level  
rejection_lev <- sqrt(-0.5*log(ci/2))*sqrt((n1+n2)/(n1*n2))
# P-value
p.value <- exp(- (2 * n2) / (1 + n1/n2) * ks_stat^2)
KS-test plot
y_breaks <- seq(0, 1, 0.2)
y_labels <- paste0(format(y_breaks*100, digits = 2), "%")
grid_max <- grid[which.max(abs(cdf_1(grid) - cdf_2(grid)))]
ggplot()+
  geom_ribbon(aes(grid, ymax = cdf_1(grid), ymin = cdf_2(grid)), 
              alpha = 0.5, fill = "green") +
  geom_line(aes(grid, cdf_1(grid)))+
  geom_line(aes(grid, cdf_2(grid)), color = "red")+
  geom_segment(aes(x = grid_max, xend = grid_max, 
                   y = cdf_1(grid_max), yend = cdf_2(grid_max)), 
               linetype = "solid", color = "magenta")+
  geom_point(aes(grid_max, cdf_1(grid_max)), color = "magenta")+
  geom_point(aes(grid_max, cdf_2(grid_max)), color = "magenta")+
  scale_y_continuous(breaks = y_breaks, labels = y_labels)+
  labs(x = "x", y = "cdf")+
  theme_bw()
Figure 26.2: Two samples cdfs and KS-statistic (magenta) for a non-stationary time series.
KS-test (non-stationary)
kab <- dplyr::tibble(
  t_split = t_split,
  ci = ci,
  n1 = n1,
  n2 = n2,
  KS = ks_stat,
  p.value = p.value,
  rejection_lev = rejection_lev,
  H0 = ifelse(KS > rejection_lev, "Rejected", "Non-Rejected"))  %>%
  mutate_if(is.numeric, format, digits = 4, scientific = FALSE)
colnames(kab) <- c("$\\textbf{Index split}$","$\\alpha$", "$n_1$", "$n_2$", 
                   "$KS_{n_1, n_2}$", "p.value", "$\\textbf{Critical level}$", "$H_0$")
knitr::kable(kab, escape = FALSE) 
Table 26.2: KS test for a non-stationary time series.
Index split α n1 n2 KSn1,n2 p.value Critical level H0
166 0.05 166 334 0.1643 0.000005831 0.129 Rejected