Univariate V: Long Memory Time Series

What is this all about?

This is about a special class of (univariate) stationary time series models for time series data that have slowly decaying ACFs (autocorrelation functions). These models stand in sharp contrast to short memory time series models, having exponentially decaying ACFs, and often need to be dealt with using special techniques.

Long memory (also known as long-range dependendent, strongly depenendent, etc.) models are used across a range of areas, including Economics and Finance (e.g. Robinson (2003)), Engineering (e.g. Park and Willinger (2000)), Geophysics (e.g. Dmowska and Saltzman (1999)), Environmental Sciences, Neuroscience, and others (e.g. Rangarajan and Ding (2003)).

Motivating examples

Example 1: Annual net accumulation from Svalbard, Norway, ice cores for years 1657 to 1973 (in centimeters).

data     <- read.table("accumulation.dat")
svalbard <- rev(data[,2])
ts       <- ts(svalbard,start=c(1657),end=c(1973),frequency=1)
plot.ts(ts,ylab = 'Accumulation in cm',xlab='Year')

years <- time(ts)
lm.svalbard <- lm(ts ~ years)
resid.ts <- resid(lm.svalbard)
resid.ts <- ts(resid.ts ,start=c(1657),end=c(1973),frequency=1)
plot(resid.ts,ylab = 'Accumulation in cm',xlab='Year')

#par(mfrow = c(1, 2)) 
#plot(resid.ts,ylab = 'Accumulation in cm',xlab='Year')
#acf(resid.ts, lag.max=20, ylim=c(-1, 1))

forecast::auto.arima(resid.ts,max.p=10,max.q=0,ic="aic",allowmean  = FALSE)

## Series: resid.ts 
## ARIMA(5,0,0) with zero mean     
## 
## Coefficients:
##          ar1      ar2      ar3     ar4      ar5
##       1.0150  -0.0862  -0.4010  0.3532  -0.1253
## s.e.  0.0561   0.0785   0.0753  0.0785   0.0562
## 
## sigma^2 estimated as 25.58:  log likelihood=-961.91
## AIC=1935.82   AICc=1936.09   BIC=1958.37

ar5.model <- arima(resid.ts, order=c(5,0,0), include.mean = FALSE, method = "ML")

resid.ar5 <- resid(ar5.model)
par(mfrow = c(1, 3)) 
plot(resid.ar5,ylab = 'Residuals',xlab='Year')
acf(resid.ar5, lag.max=20, ylim=c(-1, 1))
pacf(resid.ar5, lag.max=20, ylim=c(-1, 1))

ar6.model <- arima(resid.ts, order=c(6,0,0), include.mean = FALSE, method = "ML")

resid.ar6 <- resid(ar6.model)
par(mfrow = c(1, 3)) 
plot(resid.ar6,ylab = 'Residuals',xlab='Year')
acf(resid.ar6, lag.max=20, ylim=c(-1, 1))
pacf(resid.ar6, lag.max=20, ylim=c(-1, 1))

Example 2: The Irish wind speed data (after square rooting and removing seasonality).

library(gstat)
data("wind")

wind$time = ISOdate(wind$year+1900, wind$month, wind$day)
wind$jday = as.numeric(format(wind$time, '%j'))


windsqrt = sqrt(0.5148 * as.matrix(wind[4:15]))
Jday = 1:366
windsqrt = windsqrt - mean(windsqrt)
daymeans = sapply(split(windsqrt, wind$jday), mean)
meanwind = lowess(daymeans ~ Jday, f = 0.1)$y[wind$jday]
velocity = apply(windsqrt, 2, function(x) { x - meanwind })

ts.wind = velocity[,1]
par(mfrow = c(1, 2)) 
plot(ts.wind,ylab = 'sqrt wind speed',xlab='Day',type="l")
acf(ts.wind,lag.max=50)

Example 3: Financial data: time series of US Dollar to Euro exchange rates recorded daily from January 4, 1999 to February 3, 2017.

library(forecast)
library(zoo)

exch <- read.csv("DEXUSEU.csv")

r <- zoo(exch)
index(r) <- index(exch)
r_approx <- na.approx(r$Rate)

exch$RawRate <- exch$Rate
exch$Rate <- as.numeric(r_approx)

ts <- ts(exch$Rate)
par(mfrow = c(1, 2)) 
plot(ts)

diffmod <- Arima(as.numeric(ts),order=c(0,1,0), method="ML")

res <- resid(diffmod)
res <- na.omit(res)
plot(res)

par(mfrow=c(1,2))
acf(ts(res),lag.max=20,ylim=c(-1,1),na.action=na.pass)
pacf(res,lag.max=20,ylim=c(-1,1),na.action=na.pass)

par(mfrow=c(1,2))
acf(ts(res^2),lag.max=20,ylim=c(-1,1),na.action=na.pass)
pacf(res^2,lag.max=20,ylim=c(-1,1),na.action=na.pass)

par(mfrow=c(1,2))
log_res <- log(abs(res)+.01) 
acf(log_res,lag.max=20,ylim=c(-1,1),na.action=na.pass)
pacf(log_res,lag.max=20,ylim=c(-1,1),na.action=na.pass)

Data/Theory goals of the lecture

The goal of the lecture is to describe some basic long memory model that can be used and seem more appropriate for the above time series data. When describing the models, the emphasis will be on differences from the short memory models introduced earlier.

Long memory models: FARIMA models

One of the most basic long memory model is the so-called FARIMA\((0,d,0)\) model (“F” stands for “fractionally” and “I” for “Integrated”“) with \(d\in (0,1/2)\). It is defined as \[ X_t = (I- B)^{-d} Z_t = \Big( \sum_{j=0}^\infty b_j B^j \Big) Z_t = \sum_{j=0}^\infty b_j Z_{t-j}, \] where \(\{Z_t\}\sim \rm{WN}(0,\sigma^2_Z)\) and \(b_j\), \(j\geq 0\), are the coefficients in the Taylor expansion \((1-z)^{-d} = \sum_{j=0}^\infty b_j z^j\) (and which depend on \(d\)). It is known that: the coefficients in the linear representation satisfy \[ b_j \sim C_1 j^{d-1},\quad j\to \infty, \] the ACF satisfies \[ \gamma_X(h) = \mbox{Cov}(X_t,X_{t+h})\sim C_2 h^{2d-1},\quad h\to\infty, \] and the spectral density satisfies \[ f_X(w) \sim C_3 w^{-2d},\quad w\to 0^+, \] where all the constants \(C_1,C_2,C_3>0\). These conditions are known to be equivalent in general under mild assumptions. The parameter \(d\) is called the long memory parameter.

Notes

Note that the condition \(d<1/2\) ensures that the linear series above is well-defined in the \(L^2(\Omega)\)–sense, and similarly, that the ACF function decays to \(0\) for large lags. The larger the value of \(d\), the stronger the dependence.
When \(d=0\), FARIMA\((0,d,0)\) series is a WN series. When \(d=1\), FARIMA\((0,d,0)\) series is the integrated series (random walk) considered earlier that is not stationary. Since \((I-B)^{-d}\) with integers \(d\), e.g. \(d=1\), can be viewed as “integration” operation of integer order, the operation is referred to as “fractional integration” for non-integer values \(d>0\). It can also be shown that the FARIMA\((0,d,0)\) series satisfies the equation \[ (I-B)^d X_t = Z_t. \]
The conditions above stand in sharp contrast to similar conditions for ARMA and other short memory models. E.g. note that for long memory models \[ \sum_h |\gamma_X(h)| = \infty \] whereas for ARMA and many other short memory models, the ACF decays exponentially fast and hence \[ \sum_h |\gamma_X(h)| < \infty. \] Similarly, for ARMA and many other short memory models, the spectral density function satisfies \(f_X(w) \sim C_3\) as \(w\to 0\), that is, associated with the case \(d=0\).
The spectral domain condition also suggests how long memory series “look” like. They are characterized by the presence of very low-frequency waves, and also appear to exhibit changes in local mean level. See the plots of generated series below.

FARIMA\((0,d,0)\) model is naturally extended to FARIMA\((p,d,q)\) models by allowing for better flexibility to capture short memory effects.

Definition 5.1: A time series model \(\{X_t\}\) is called FARIMA\((p,d,q)\) (fractionally integrated ARMA) if it satisfies the equation \[ (I-B)^d \phi(B) X_t = \theta(B) Z_t, \] where \(\{Z_t\}\sim \rm{WN}(0,\sigma^2_Z)\), \(\phi(z) = 1 - \phi_1 z - \ldots -\phi_p z^p\) is the AR polynomial and \(\theta(z) = 1 + \theta_1 z + \ldots +\theta_q z^q\) is the MA polynomial.

library(arfima)

set.seed(123456)
ts.sim1 <- arfima.sim(1000, model = list(dfrac = .45))

set.seed(123456)
ts.sim2 <- arfima.sim(1000, model = list(phi = .2, dfrac = .3))

par(mfrow=c(1,2))
plot.ts(ts.sim1)
plot.ts(ts.sim2)

Testing for long memory

Note that the spectral domain condition for long memory can be expressed as \[ \log f_X(w) \sim \log C_3 + (-2d) \log w,\quad w\to 0^+. \] This suggests that, in practice, \(d = d(m)\) could be estimated in the regression \[ \log I_X(w_k) \sim C + (-2d) \log w_k,\quad k=1,\ldots,m, \] where \(I_X(w_k)\) is the periodogram at the Fourier frequency \(w_k = \frac{2\pi k}{T}\). The resulting estimator \(\widehat d = \widehat d(m)\) is known as the GPH estimator of long memory, after Geweke and Porter-Hudak (1983). For detecting long memory, the plot of \(\widehat d = \widehat d(m)\) is examined for smaller values of \(m\).

Example 1: Annual net accumulation from Svalbard, Norway, ice cores for years 1657 to 1973 (in centimeters).

y = resid.ts

y.spec = spectrum(y, plot=FALSE)
lhs = log(y.spec$spec)
rhs = log(4*(sin(y.spec$freq/2))^2)

M = 100 
d = vector()
d.up = vector()
d.lo = vector()
for (m in 1:M){
  gph.reg = lm(lhs[1:m] ~ rhs[1:m])
  gph.sum = summary(gph.reg)
  d[m] = - gph.reg$coefficients[2]
  me = sqrt(pi^2/24)*qnorm(.975)/sqrt(m)
  d.up[m] = d[m] + me
  d.lo[m] = d[m] - me
}

m = seq(1,M)  
plot.new()
par(mfrow=c(1,1))
plot(m,d,type="l",ylim=c(-.5, 1))
lines(m,d.up,lty=2)
lines(m,d.lo,lty=2)

# ad-hoc choice for m
length(y)^(1/2)

## [1] 17.80449

Example 2: The Irish wind speed data (after square rooting and removing seasonality).

y = ts.wind

y.spec = spectrum(y, plot=FALSE)
lhs = log(y.spec$spec)
rhs = log(4*(sin(y.spec$freq/2))^2)

M = 500 
d = vector()
d.up = vector()
d.lo = vector()
for (m in 1:M){
  gph.reg = lm(lhs[1:m] ~ rhs[1:m])
  gph.sum = summary(gph.reg)
  d[m] = - gph.reg$coefficients[2]
  me = sqrt(pi^2/24)*qnorm(.975)/sqrt(m)
  d.up[m] = d[m] + me
  d.lo[m] = d[m] - me
}

m = seq(1,M)  
plot.new()
par(mfrow=c(1,1))
plot(m,d,type="l",ylim=c(-.5, 1))
lines(m,d.up,lty=2)
lines(m,d.lo,lty=2)

# ad-hoc choice for m
length(y)^(1/2)

## [1] 81.08021

Example 3: Financial data.

y = res^2

y.spec = spectrum(y, plot=FALSE)
lhs = log(y.spec$spec)
rhs = log(4*(sin(y.spec$freq/2))^2)

M = 500 
d = vector()
d.up = vector()
d.lo = vector()
for (m in 1:M){
  gph.reg = lm(lhs[1:m] ~ rhs[1:m])
  gph.sum = summary(gph.reg)
  d[m] = - gph.reg$coefficients[2]
  me = sqrt(pi^2/24)*qnorm(.975)/sqrt(m)
  d.up[m] = d[m] + me
  d.lo[m] = d[m] - me
}

m = seq(1,M)  
plot.new()
par(mfrow=c(1,1))
plot(m,d,type="l",ylim=c(-.5, 1))
lines(m,d.up,lty=2)
lines(m,d.lo,lty=2)

# ad-hoc choice for m
length(y)^(1/2)

## [1] 68.46167

Example 4: Short memory series, AR\((1)\) model:

set.seed(12345)
y = arima.sim(list(order = c(1,0,0), ar = 0.6), n = 500)

y.spec = spectrum(y, plot=FALSE)
lhs = log(y.spec$spec)
rhs = log(4*(sin(y.spec$freq/2))^2)

M = 100
d = vector()
d.up = vector()
d.lo = vector()
for (m in 1:M){
  gph.reg = lm(lhs[1:m] ~ rhs[1:m])
  gph.sum = summary(gph.reg)
  d[m] = - gph.reg$coefficients[2]
  me = sqrt(pi^2/24)*qnorm(.975)/sqrt(m)
  d.up[m] = d[m] + me
  d.lo[m] = d[m] - me
}

m = seq(1,M)  
plot.new()
par(mfrow=c(1,1))
plot(m,d,type="l",ylim=c(-.5, 1))
lines(m,d.up,lty=2)
lines(m,d.lo,lty=2)

# ad-hoc choice for m
length(y)^(1/2)

## [1] 22.36068

Fitting FARIMA models

This can be done through the usual Gaussian MLE in the time domain. Back to Example 1 above:

library(arfima)

fit_0d2 <- arfima(resid.ts,order = c(0,0,2), numeach = c(2,1), dmean = FALSE)

## Note: autoweed is ON.  It is possible, but not likely,
##  that unique modes may be lost.
## Beginning the fits with 4 starting values.

fit_0d2

## Number of modes: 1 
## 
## Call:
## arfima(z = resid.ts, order = c(0, 0, 2), numeach = c(2, 1), dmean = FALSE)
## 
## Coefficients for fits:
##             Coef.1:       SE.1:     
## theta(1)    -0.926372      0.0380369
## theta(2)    -0.820954      0.0385181
## d.f          0.182127      0.0481277
## zbar        -3.51863e-17            
## logl        -494.507                
## sigma^2      22.6328                
## theta_p(1)  -0.508729               
## theta_p(2)  -0.820954               
## Starred fits are close to invertibility/stationarity boundaries

ts <- residuals(fit_0d2)
acf(ts$Mode1)

AIC(fit_0d2)

## [1] 999.0144

AIC(ar6.model)

## [1] 1936.995

Some final notes

What do you expect to be different for long memory series from short memory ones?
For more on long memory, see Beran et al. (2013), Giraitis et al. (2012), Pipiras and Taqqu (2017).
One important practical issue with long memory models is their confusion with changes in mean and other nonstationary models. For statistical tests to distinguish between the two phenomena, see e.g. Baek and Pipiras (2014) and references therein.

References

Baek, C., and V. Pipiras. 2014. “On Distinguishing Multiple Changes in Mean and Long-Range Dependence Using Local Whittle Estimation.” Electronic Journal of Statistics 8 (1): 931–64.

Beran, J., Y. Feng, S. Ghosh, and R. Kulik. 2013. Long-Memory Processes: Probabilistic Properties and Statistical Methods. Springer.

Dmowska, R., and B. Saltzman. 1999. Long-Range Persistence in Geophysical Time Series. Advances in Geophysics. Elsevier Science.

Geweke, J., and S. Porter-Hudak. 1983. “The Estimation and Application of Long Memory Time Series Models.” Journal of Time Series Analysis 4 (4): 221–38.

Giraitis, L., H. L. Koul, and D. Surgailis. 2012. Large Sample Inference for Long Memory Processes. London: Imperial College Press.

Park, K., and W. Willinger, eds. 2000. Self-Similar Network Traffic and Performance Evaluation. New York: J. Wiley & Sons, Inc.

Pipiras, V., and M. S. Taqqu. 2017. Long-Range Dependence and Self-Similarity. Cambridge University Press.

Rangarajan, G., and M. Ding. 2003. Processes with Long-Range Correlations: Theory and Applications. Vol. 621. Springer Science & Business Media.

Robinson, P. M. 2003. Time Series with Long Memory. Advanced Texts in Econometrics. Oxford University Press.