Nonstationary I: Change Points

What is this all about?

This is about detecting changes in time series (models), for example, their mean, variance or other characteristics/parameters. We assume implicitly that these structural changes are present for considerable period of time (in particular, allowing for reasonable estimation). In this sense, the changes are not “outliers” but rather are associated with an intrinsic structure of the time series in question.

We shall also focus on retroactive change point detection, as opposed to sequential (monitoring) change point detection. Moreover, we are thinking about abrupt changes, rather than trying to model slowly evolving changes. We shall not model occurrences of change points either, as in e.g. Markov switching models.

Motivating thoughts

A typical (preprocessed) fMRI data consists of BOLD signal measurements for brain regions of interest (ROIs) across various brain networks (e.g. auditory, visual, postsalience, and so on). For example, the plots below are the measurements for 3 ROIs in the auditory network for an individual in rest state.

fmri.ts <- read.csv("fmri.csv")

par(mfrow=c(3,1))
plot.ts(fmri.ts[1:150,"auditory1"],ylab="aud 1")
plot.ts(fmri.ts[1:150,"auditory2"],ylab="aud 2")
plot.ts(fmri.ts[1:150,"auditory3"],ylab="aud 3")

Changes in the dependence structure is expected across ROIs (and networks) over time, for example,

As individuals perform different tasks
As one looks across different groups of individuals (e.g. brain disease versus non-disease)

What are ways to detect such changes? (For basic background information on fMRI signals and their statistical analysis, see e.g. Lindquist (2008).)

Data/Theory goals of the lecture

The goal of the lecture is to discuss several change point detection techniques applicable in the time series context, starting with detection of changes in time series mean, and to illustrate them on simulated and real data. The focus will be on univariate series throughout, but with multivariate series discussed in the notes.

From R perspective, the main sources are the packages \(\verb!strucchange!\) (see Zeileis et al. (2001)), \(\verb!changepoint!\) (see Killick and Eckley (2014)), as well as bits and pieces of added code. These are not “perfect” but in most cases sufficient for illustration purposes needed here.

Testing for changes in time series mean

Available methods for change points are easiest and most instructive to introduce in the case of testing for changes in mean. Moreover, consider first the case of IID residuals/observations. That is, the model \[ X_t = \mu_t + Z_t,\quad t=1,\ldots,T, \] where \(\{Z_t\}\sim \mbox{IID}(0,\sigma^2_Z)\). The case \[ \mu_t \equiv \bar \mu_0 \] corresponds to no change in mean, and the case \[ \mu_t \equiv \bar \mu_i,\quad t = T_{i-1}+1,\ldots,T_i,\quad i=1,\ldots,m+1, \] to \(m\) changes in mean at the times \(T_1,T_2,\ldots,T_m\) (with \(T_0=0\) and \(T_{m+1}=T\)). One is interested to test the hypothesis \[ \mbox{H}_0 : \mu_t \equiv \bar \mu_0\quad (\mbox{no change point in mean}). \] There are several well-known tests and approaches for this task (some of which can then be generalized to more complex problems), including:

“Fluctuation” tests (CUSUM, etc.)
\(F\)-tests (Chow, etc.)
Model selection

These are explained and illustrated next.

“Fluctuation” tests (CUSUM, etc.)

With the data \(x_1,\ldots,x_T\), let \(\bar x_{(1:t)} = \frac{1}{t} \sum_{i=1}^t x_i\) be the average of the observations up to time \(t\), and \(\bar x = \bar x_{(1:T)}\) be the average of all \(T\) observations. Consider the OLS and recursive residuals defined as: \[ \widehat z_t = x_t - \bar x,\quad \widetilde z_t = x_t - \bar x_{(1:t)} \] and the corresponding variance estimators \[ \widehat \sigma^2 = \frac{1}{T-1} \sum_{t=1}^T \widehat z_t^2,\quad \widetilde \sigma^2 = \frac{1}{T-1} \sum_{t=1}^T (\widetilde z_t - \bar{\widetilde z})^2. \] The corresponding CUSUM processes are then defined as: \[ W_T^0(u) = \frac{1}{\widehat \sigma \sqrt{T}} \sum_{t=1}^{[Tu]} \widehat z_t,\quad W_T(u) = \frac{1}{\widetilde \sigma \sqrt{T}} \sum_{t=1}^{[Tu]} \widetilde z_t,\quad u\in [0,1]. \] One can show that \[ W_T^0(u) \approx W^0(u) = W(u) - uW(1),\quad W_T(u) \approx W(u) \] where \(\{W(u)\}_{u\in [0,1]}\) is the standard Brownian motion (Wiener process) and \(\{W^0(u)\}_{u\in [0,1]}\) is its Brownian bridge. (Why?)

Under AMOC (at most one change point) at time \(T_1\) (or after rescaling, at \(u_1=T_1/T\)), the recursive residuals will only have zero mean up to \(u_1 = T_1/T\). Hence the path of the process \(W_T(u)\) should be close to \(0\) up to \(u_1\) and leave its mean afterwards. Similarly, the path of \(W_T^0(u)\) should have a peak around \(u_1\). In fact, with AMOC at \(u_1\) as the alternative hypothesis \(\mbox{H}_1\) and under Gaussianity, the likelihood ratio test statistic is related to \(W_T^0(u_1)\) and it is natural to estimate the change point \(u_1\) as \[ \widehat u_1 = \mathop{\rm argmax}_{u} |W_T^0(u)| \] (see e.g. Csorgo and Horvath (1997)).

The idea that is common to all generalized fluctuation tests is that the null hypothesis of “no structural change” should be rejected when the fluctuation of the empirical process \(efp(u)\), here either \(W_T^0(u)\) or \(W_T(u)\), gets improbably large compared to the fluctuation of the limiting process. This comparison is performed by some appropriate boundary \(b(u)\), that the limiting process just crosses with a given probability \(\alpha\). Thus, if \(efp(u)\) crosses either \(b(u)\) or \(-b(u)\) for any \(u\), then it has to be concluded that the fluctuation is improbably large and the null hypothesis can be rejected at confidence level \(\alpha\).

For the limiting Brownian motion and Brownian bridge, it would seem natural to use boundaries that are proportional to the standard deviation function of the corresponding theoretic process, i.e., \[ b(u) = \lambda \sqrt{u},\quad b(u) = \lambda \sqrt{u(1-u)}, \] for the OLS-based CUSUM and the Recursive CUSUM path respectively, where \(\lambda\) determines the confidence level. Other boundaries that are commonly used are linear, because a closed form solution for the crossing probability is known. So the standard boundaries for the two process are of type \[ b(u) = \lambda (1+2u), \quad b(u) = \lambda. \]

set.seed(12)
m.data <- c(rnorm(100, 0, 1), rnorm(100, 1, 1), rnorm(100, 0.2, 1))
ts.plot(m.data, xlab = "Index")

library("strucchange")
temp.cus1 <- efp(m.data~1, type="OLS-CUSUM")
plot(temp.cus1, alpha = 0.01, alt.boundary = TRUE)

plot(temp.cus1, alpha = 0.01, alt.boundary = FALSE)

temp.cus2 <- efp(m.data~1, type="Rec-CUSUM")
plot(temp.cus2, alpha = 0.01, alt.boundary = TRUE)

plot(temp.cus2, alpha = 0.01, alt.boundary = FALSE)

These (non)crossings can also be put into a formal test, based on, for example, the statistic \[ \sup_u \frac{|efp(u)|}{f(u)} \] where \(f(u)\) appears in \(b(u) = \lambda f(u)\).

sctest(m.data~1, type="OLS-CUSUM",alt.boundary = TRUE)

## 
##  OLS-based CUSUM test with alternative boundaries
## 
## data:  m.data ~ 1
## S0 = 4.7939, p-value = 1e-04

sctest(m.data~1, type="OLS-CUSUM",alt.boundary = FALSE)

## 
##  OLS-based CUSUM test
## 
## data:  m.data ~ 1
## S0 = 2.2599, p-value = 7.33e-05

Multiple change points could be dealt with by using e.g. the binary segmentation - to be discussed below.

\(F\)-tests (Chow, etc.)

Supposing AMOC, one could also consider a test based on \(F\)-statistic (also known as Chow test, after Chow (1960)): for a known change point at \(t_0\), the \(F\)-statistic is defined as \[ F_{t_0} = \frac{(RSS_r - RSS_f)/1}{RSS_f/(T-2)}, \] where \(RSS_r\) is the sum of the residuals squared under the “restricted” model where there the mean is the same, and \(RSS_f\) is the sum of the residuals squared under the “full” model having two means. Under the null hypotheses of the same mean, the \(F\)-statistic has (approximately) an \(F\) distribution, with \((1, T-2)\) degrees of freedom.

Not knowing \(t_0\) and as with empirical fluctuation processes, one could look at the statistics \(F_t\) across \(t\), devise a crossing boundary and build formal tests.

set.seed(12)
m2.data <- c(rnorm(100, 0, 1), rnorm(100, .5, 1))
ts.plot(m2.data, xlab = "Index")

fs <- Fstats(m2.data~1)
plot(fs)

sctest(fs, type="supF")

## 
##  supF test
## 
## data:  fs
## sup.F = 16.749, p-value = 0.00108

Model selection

From the model selection perspective, another possibility is to choose a model that minimizes the objective function (with respect to the number of change points \(m\), and change points \(T_i\), \(i=1,\ldots,m\)) \[ \sum_{i=1}^{m+1} {\cal C}(x_{(T_{i-1}+1:T_i)}) + \beta g(m), \] where \({\cal C}(x_{(T_{i-1}+1:T_i)})\) is a cost associated with the segment \(x_{T_{i-1}+1},\ldots,x_{T_i}\) of the data and \(\beta f(m)\) is a penalty. A typical cost is twice the negative (Gaussian) log-likelihood, in this case \[ C(x_{(T_{i-1}+1:T_i)}) = \sum_{t=T_{i-1}+1}^{T_i} (x_t - \bar x_{(T_{i-1}+1:T_i)})^2. \] A penalty that is linear in \(m\) is often considered, as e.g.

\[ \beta g(m) = \beta m, \] where \(\beta = p \log T\) corresponds to SIC (BIC), with \(p=1\) being the number of additional parameters introduced by adding a change point.

The optimization problem can be solved by several methods, including

Binary segmentation (approximate but fast; e.g. Bai (1997))
Dynamic programming (exact but slow and in the linear case; e.g. Bai and Perron (2003), Jackson et al. (2005))
Pruned exact linear time (PELT of Killick et al. (2012); exact and fast under mild assumptions)

(Discuss.)

library("changepoint")

set.seed(10)
m2.data <- c(rnorm(100, 0, 1), rnorm(100, 1, 1), rnorm(100, 0, 1),rnorm(100, 0.2, 1))
ts.plot(m2.data, xlab = "Index")

m2.pelt <- cpt.mean(m2.data, method = "PELT")
plot(m2.pelt, type = "l", cpt.col = "blue", xlab = "Index", cpt.width = 4)

cpts(m2.pelt)

## [1]  97 192

m2.binseg <- cpt.mean(m2.data, method = "BinSeg")
plot(m2.binseg, type = "l", xlab = "Index", cpt.width = 4)

cpts(m2.binseg)

## [1]  79 192

Notes

Some theoretical results concerning the asymptotic framework, confidence intervals, consistency, and so on, can be found in Bai and Perron (1998), Lavielle and Moulines (2000).
Other approaches: e.g. the fused lasso of Tibshirani et al. (2005), minimizing wrt \(\mu_t\), \[ \sum (x_t - \mu_t)^2 + \lambda_1 \sum |\mu_t| + \lambda_2 \sum |\mu_t - \mu_{t-1}|. \]

Question: What does one do if the residuals/observations are temporally correlated? What difference does it make?

set.seed(10)
ar.data <- arima.sim(list(order = c(1,0,0), ar = 0.7), n = 400)

ar.pelt <- cpt.mean(ar.data, method = "PELT")
plot(ar.pelt, type = "l", cpt.col = "blue", xlab = "Index", cpt.width = 4)

#ar.binseg <- cpt.mean(ar.data, method = "BinSeg")
#plot(ar.binseg, type = "l", xlab = "Index", cpt.width = 4)

Important point: Whether a time series should be modeled as changes in mean superimposed by IID series, or as changes in mean superimposed by temporally dependent series is eventually motivated by physical considerations (and if these are not available, by the usefulness of the model).

“Fluctuation” tests (CUSUM, etc.) with time dependent residuals

“Fluctuation” tests still work for dependent residuals but with one important modification. For example, in \[ W_T^0(u) = \frac{1}{\widehat \sigma \sqrt{T}} \sum_{t=1}^{[Tu]} \widehat z_t,\quad W_T^0(u) \approx W^0(u),\quad u\in [0,1]. \] the variance estimator \(\widehat \sigma^2\) now needs to estimate not \(\mbox{Var}(Z_t)\) but rather the so-called long-run variance \[ \sigma^2 = \sum_{h=-\infty}^\infty \gamma_Z(h) = \lim_{T\to\infty} \frac{1}{T} \mbox{Var}\Big( \sum_{t=1}^T Z_t \Big). \] Long-run variance is a tricky quantity to estimate and there is a lot of work on how to do this reasonably well (e.g. Andrews (1991)). For example, \[ \widehat \sigma^2 = \sum_{h=-(T-1)}^{T-1} \widehat \gamma_Z(h) \] will NOT work (since it is always equal to \(0\)). A common choice is to take \[ \widehat \sigma^2 = \sum_{h=-(T-1)}^{T-1} K(\frac{h}{S_T}) \widehat \gamma_Z(h), \] where \(K\) is a kernel function, e.g. the Bartlett kernel \(K(x)=(1-|x|)1_{|x|\leq 1}\), and \(S_T\) is a “bandwidth.”

In the illustrations below: CUSUM = as above with \(S_T\) corresponding to the optimal bandwidth supposing AR\((1)\) model, CUSUM-MAC = as above but estimating the long-run variance through the spectral domain, CUSUM-RO = working with the residuals obtained by fitting a parametric ARMA\((p, q)\) model to the given data.

source(file="Mult-breaks-library.R")

set.seed(10)
ar.data <- arima.sim(list(order = c(1,0,0), ar = 0.7), n = 400)


data = ar.data
#out1 = supF_multi(data)
out2 = cusum_multi(data)
#out3 = cusum_multiJX(data) 
out4 = cusum_multiRO(data)
out5 = cusum_multiMAC(data)

c(out2$iter, out4$iter, out5$iter)-1

## [1] 0 0 1

# Plotting
plot(data, type="l", ylab="Yearly minimum", xlab="")
points(out2$fitted, type="l", col="red", lwd=2)
points(out4$fitted, type="l", col="blue", lwd=2)
points(out5$fitted, type="l", col="green", lwd=2)
legend(20, 3, c("CUSUM","RO","MAC"), cex=1.0, col=c("red", "blue", "green"), lty=c(1, 1, 1), lwd=2, horiz=TRUE)

set.seed(10)
ar.data <- arima.sim(list(order = c(1,0,0), ar = 0.7), n = 400)
ar.data <- ar.data + c(rep(0,200),rep(1,200))


data = ar.data
#out1 = supF_multi(data)
out2 = cusum_multi(data)
#out3 = cusum_multiJX(data) 
out4 = cusum_multiRO(data)
out5 = cusum_multiMAC(data)

c(out2$iter, out4$iter, out5$iter)-1

## [1] 1 1 1

# Plotting
plot(data, type="l", ylab="Yearly minimum", xlab="")
points(out2$fitted, type="l", col="red", lwd=2)
points(out4$fitted, type="l", col="blue", lwd=2)
points(out5$fitted, type="l", col="green", lwd=2)
legend(20, 3, c("CUSUM","RO","MAC"), cex=1.0, col=c("red", "blue", "green"), lty=c(1, 1, 1), lwd=2, horiz=TRUE)

Approaches based on decorrelated residuals

Regress on the past values and work with the residuals.

set.seed(10)
ar.data <- arima.sim(list(order = c(1,0,0), ar = 0.7), n = 400)

temp.cus.ar <- efp(ar.data~1, type="OLS-CUSUM")
plot(temp.cus.ar, alpha = 0.01, alt.boundary = TRUE)

plot(temp.cus.ar, alpha = 0.01, alt.boundary = FALSE)

temp.cus.ar2 <- efp(ar.data~1, type="OLS-CUSUM",dynamic = TRUE)
plot(temp.cus.ar2, alpha = 0.01, alt.boundary = TRUE)

plot(temp.cus.ar2, alpha = 0.01, alt.boundary = FALSE)

Testing for changes in time series model parameters

The methods discussed above extend straightforwardly to time series models, in particular, AR models \[ X_t = \phi_0 + \phi_1 X_{t-1} + \ldots + \phi_p X_{t-p} + Z_t \] or other time series regression models (e.g. with covariates), where the analysis is carried out on the residuals. Moreover, now a change is defined as that for model parameter vector (not just the mean). Here are key modifiations:

For example, for the AR model, the residuals are defined as \(\widehat z_t = x_t - \widehat \phi_0 - \widehat \phi_1 x_{t-1} -\ldots - \widehat \phi_p x_{t-p}\), where \(\widehat \phi_j\)’s are the OLS estimators.
Instead of defining fluctuation processes on the basis of residuals, they can be equally well based on estimates of the unknown regression coefficients. These fluctuation processes are multivariate.
The limiting \(F\) distribution for \(F\)-statistic has the degrees of freedom \((k,T-2k)\), where \(k\) is the number of parameters in the time series regression model.

See Zeileis et al. (2001) for more details.

Data example

ECM example from the package \(\verb!strucchange!\). The data set contains the aggregated monthly personal income and personal consumption expenditures (in billion US dollars) between January 1959 and February 2001, which are seasonally adjusted at annual rates.

data(USIncExp)

USIncExp2 <- window(USIncExp, start = c(1985,12))
coint.res <- residuals(lm(expenditure ~ income, data = USIncExp2))
coint.res <- lag(ts(coint.res, start = c(1985,12), freq = 12), k = -1)
USIncExp2 <- cbind(USIncExp2, diff(USIncExp2), coint.res)
USIncExp2 <- window(USIncExp2, start = c(1986,1), end = c(2001,2))
colnames(USIncExp2) <- c("income", "expenditure", "diff.income","diff.expenditure", "coint.res")
plot.ts(USIncExp2[,c("diff.income","diff.expenditure", "coint.res")],main="Time series")

ecm.model <- diff.expenditure ~ coint.res + diff.income
ocus <- efp(ecm.model, type="OLS-CUSUM", data=USIncExp2)
me <- efp(ecm.model, type="fluctuation", data=USIncExp2, h=0.2)

bound.ocus <- boundary(ocus, alpha=0.05)
plot(ocus, boundary = FALSE)
lines(bound.ocus, col = 4)
lines(-bound.ocus, col = 4)

plot(me, functional = NULL)

sctest(ocus)

## 
##  OLS-based CUSUM test
## 
## data:  ocus
## S0 = 1.5511, p-value = 0.01626

sctest(ecm.model, type="OLS-CUSUM", data=USIncExp2)

## 
##  OLS-based CUSUM test
## 
## data:  ecm.model
## S0 = 1.5511, p-value = 0.01626

Some final notes

Monitoring structural change (e.g. Zeileis et al. (2001)); Sequential change point detection (e.g. Choi et al. (2008), Poluchenko and Tartakovsky (2012)).
Multivariate time series: Most of the methods discussed extend without much difficulty to the case of multivariate series. Though they also tend to “break down” as the dimension gets higher.
Few other works to keep in mind: Davis et al. (2006) (who use MDL), Chan et al. (2014) (who use group LASSO), etc. Some issues with Chow-type tests are discussed in Candelon and Lutkepohl (2001).
What do you do with all this?

Questions/Homework

Q0: Try some of the considered change point detection methods on real time series.

Q1: Relate the test statistic \(W_T^0(u)\) to the likelihood ratio test statistics when the alternative consists of a single change point. Also, argue that the cumulative sum process of recursive residuals can be approximated by Brownian motion.

Q2: Explain in more detail the dynamic programming approach and the PELT algorithm to estimating change points.

Q3: Explain how exactly the fluctuation statistic is defined for recursive regression model coefficient estimates and what its asymptotic limit is expected.

References

Andrews, D. W. K. 1991. “Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation.” Econometrica 59 (3): 817–58.

Bai, J. 1997. “Estimating Multiple Breaks One at a Time.” Econometric Theory 13 (3): 315–52.

Bai, J., and P. Perron. 1998. “Estimating and Testing Linear Models with Multiple Structural Changes.” Econometrica 66 (1): 47–78.

———. 2003. “Computation and Analysis of Multiple Structural Change Models.” Journal of Applied Econometrics 18 (1): 1–22.

Candelon, B., and H. Lütkepohl. 2001. “On the Reliability of Chow-Type Tests for Parameter Constancy in Multivariate Dynamic Models.” Economics Letters 73 (2): 155–60.

Chan, N. H., C. Y. Yau, and R.-M. Zhang. 2014. “Group LASSO for Structural Break Time Series.” Journal of the American Statistical Association 109 (506): 590–99.

Choi, H., H. Ombao, and B. Ray. 2008. “Sequential Change-Point Detection Methods for Nonstationary Time Series.” Technometrics 50 (1): 40–52.

Chow, G. C. 1960. “Tests of Equality Between Sets of Coefficients in Two Linear Regressions.” Econometrica 28: 591–605.

Csörgo, M., and L. Horváth. 1997. Limit Theorems in Change-Point Analysis. Wiley Series in Probability and Statistics. John Wiley & Sons, Ltd., Chichester.

Davis, R. A., T. C. M. Lee, and G. A. Rodriguez-Yam. 2006. “Structural Break Estimation for Nonstationary Time Series Models.” Journal of the American Statistical Association 101 (473): 223–39.

Jackson, B., J. D. Scargle, D. Barnes, S. Arabhi, A. Alt, P. Gioumousis, E. Gwin, P. Sangtrakulcharoen, L. Tan, and T. T. Tsai. 2005. “An Algorithm for Optimal Partitioning of Data on an Interval.” IEEE Signal Processing Letters 12 (2): 105–8.

Killick, R., and I. Eckley. 2014. “Changepoint: An R Package for Changepoint Analysis.” Journal of Statistical Software 58 (3): 1–19.

Killick, R., P. Fearnhead, and I. A. Eckley. 2012. “Optimal Detection of Changepoints with a Linear Computational Cost.” Journal of the American Statistical Association 107 (500): 1590–8.

Lavielle, M., and E. Moulines. 2000. “Least-Squares Estimation of an Unknown Number of Shifts in a Time Series.” Journal of Time Series Analysis 21 (1): 33–59.

Lindquist, M. A. 2008. “The Statistical Analysis of FMRI Data.” Statistical Science 23 (4): 439–64.

Polunchenko, A. S., and A. G. Tartakovsky. 2012. “State-of-the-Art in Sequential Change-Point Detection.” Methodology and Computing in Applied Probability 14 (3): 649–84.

Tibshirani, R., M. Saunders, S. Rosset, J. Zhu, and K. Knight. 2005. “Sparsity and Smoothness via the Fused Lasso.” Journal of the Royal Statistical Society. Series B. Statistical Methodology 67 (1): 91–108.

Zeileis, A., F. Leisch, K. Hornik, and C. Kleiber. 2001. “Strucchange. an R Package for Testing for Structural Change in Linear Regression Models.” Preprint.