Univariate II: Differencing and Unit Roots

What is this all about?

Another common approach to modeling (nonstationary) time series is based on differencing and a related class of ARIMA models. The goal here is to understand the differencing operation, its implications on modeling and when it should be used.

Definition 2.1: The difference operator \(\Delta\) is defined as \[ \Delta X_t = (I-B)X_t = X_t - X_{t-1},\quad t\in\mathbb{Z}, \] where \(X=\{X_t\}_{t\in\mathbb{Z}}\) is a time series, \(B\) is the backward shift operator and \(I=B^0\).

The differencing operation can be aggregated. E.g. \[ \Delta^2 X_t = (I-B) (I-B) X_t = (I-B)(X_t - X_{t-1}) = X_t - 2X_{t-1} + X_{t-2}, \] which is also the same as \((I-B)^2X_t=(I-2B+B^2)X_t\).

Note that \(\Delta X_t = b\) for \(X_t = a+bt\), \(\Delta^2 X_t = 2c\) for \(X_t = a+bt+ct^2\), etc. Then, why not apply \(\Delta\) or \(\Delta^2\) or … to a nonstationary series (in order to remove any trends) and model the residuals as a stationary series?

Example: back to Lake Huron data

Recall that we fitted the model \(X_t = 625.55 -0.0242 t + Y_t\) with \(\{Y_t\}\) being the AR(2) series satisfying \(Y_t = 1.0050Y_{t-1} -0.2925Y_{t-2}+Z_t\) with \(\{Z_t\}\sim {\rm WN}(0,0.457)\). What about differencing instead?

data(LakeHuron)
ts <- LakeHuron

ts_diff <- diff(ts, lag=1)

par(mfrow = c(1, 3)) 
plot(ts_diff,ylab = 'Feet',xlab='Year')
acf(ts_diff, lag.max = 20, ylim=c(-1, 1))  
pacf(ts_diff, lag.max = 20, ylim=c(-1, 1))

library(forecast)

auto.arima(ts_diff,max.p=5,max.q=5,ic="aic")

## Series: ts_diff 
## ARIMA(0,0,0) with zero mean     
## 
## sigma^2 estimated as 0.5553:  log likelihood=-109.11
## AIC=220.22   AICc=220.26   BIC=222.79

auto.arima(ts_diff,max.p=5,max.q=5,ic="bic")

## Series: ts_diff 
## ARIMA(0,0,0) with zero mean     
## 
## sigma^2 estimated as 0.5553:  log likelihood=-109.11
## AIC=220.22   AICc=220.26   BIC=222.79

# See the package manual for "allowdrift"

auto.arima(ts,max.p=5,max.q=5,max.d=2,ic="aic",allowdrift=TRUE)

## Series: ts 
## ARIMA(0,1,0)                    
## 
## sigma^2 estimated as 0.5588:  log likelihood=-109.11
## AIC=220.22   AICc=220.26   BIC=222.79

auto.arima(ts,max.p=5,max.q=5,max.d=2,ic="bic",allowdrift=TRUE)

## Series: ts 
## ARIMA(0,1,0)                    
## 
## sigma^2 estimated as 0.5588:  log likelihood=-109.11
## AIC=220.22   AICc=220.26   BIC=222.79

# this uses the functions in "forecast" package

arima.model <- Arima(ts, order=c(0,1,0), method = "ML",include.drift = FALSE)
arima.model

## Series: ts 
## ARIMA(0,1,0)                    
## 
## sigma^2 estimated as 0.5588:  log likelihood=-109.11
## AIC=220.22   AICc=220.26   BIC=222.79

h = 5
ts.forecast <- forecast(arima.model, h)
plot(ts.forecast)

Note: The forecasts and prediction intervals are quite different from the previous model.

Data/Theory goals of the lecture

Understand what model is exactly fitted above, and why prediction intervals are different? Which model, the above or what we used in the previous lecture, should be preferred?

Models related to differencing

Example 2.1 (Integrated series of order 1/Random walk): Note that if \(\Delta X_t = X_t - X_{t-1} = Z_t\) with \(\{Z_t\}\sim {\rm WN}(0,\sigma_Z^2)\), then \[ X_t = X_{t-1}+Z_t = X_{t-2}+Z_{t-1}+Z_t =\ldots = X_0 + \sum_{s=1}^t Z_s. \] Such a series \(\{X_t\}\) is called integrated of order \(1\). The notation is \(\{X_t\}\sim I(1)\). When \(\{Z_t\}\sim \mbox{IID}(0,\sigma_Z^2)\), it is called a random walk. A stationary series \(\{X_t\}\) is called integrated of order \(0\), that is, \(\{X_t\}\sim I(0)\).

Note for a random walk with \(X_0=0\) that \(E X_t = 0\) but \(\mbox{Var}(X_t) = \sigma^2_Z t\) (which increases as \(t\to\infty\)). This, the model is nonstationary. This also explains why the prediction intervals in the forecasting plot above are getting larger.

More generally, a series \(\{X_t\}\) is called integrated of order \(1\) with the same notation if \(\Delta X_t = Y_t\) is stationary (not necessarily WN), in which case \(X_t = X_0 + \sum_{s=1}^t Y_s\). If \(E Y_s = m\), then \(X_t = X_0 + mt + \sum_{s=1}^t \widetilde Y_s\) where \(\{\widetilde Y_t\}\) is zero mean stationary. In contrast, the model \(X_t = a + mt + \widetilde Y_t\) is referred to as trend stationary.

Note also that \(\{X_t\}\sim I(1)\) satisfies \[ \phi_0(B) X_t = Y_t \] with \(\phi_0(z) = 1-z\). Since \(\phi_0(1)=0\), the series \(\{X_t\}\) is said to have a unit root.

# this uses the functions in "forecast" package
set.seed(123456)
e <- rnorm(100)
## random walk
rw.nd <- cumsum(e)
## trend
trd <- 1:100
## random walk with drift
rw.wd <- 0.5*trd + cumsum(e)
## deterministic trend and noise
dt <- e + 0.5*trd

## plotting
#par(mar=rep(5,4))
plot.ts(dt, lty=1, ylab='', xlab='')
lines(rw.wd, lty=2)
par(new=T)
plot.ts(rw.nd, lty=3, axes=FALSE)
axis(4, pretty(range(rw.nd)))
lines(rw.nd, lty=3)
legend(35, 0, legend=c('det. trend + noise','rw drift', 'rw'),lty=c(1, 2, 3))

Example 2.2 (ARIMA models): Ignoring the MA part for simplicity, \(X = \{X_t\}\) is called ARIMA\((p,d,0)\) if \(\Delta^d X_t = (I-B)^d X_t = Y_t\) is ARMA\((p,0)\) (or AR\((p)\)) series. That is, \(\{X_t\}\) satisfies the equation \[ \phi(B) (I-B)^d X_t = Z_t, \] where \(\phi(z) = 1 -\phi_1 z - \ldots - \phi_p z^p\) is an AR polynomial and \(\{Z_t\}\sim {\rm WN}(0,\sigma_Z^2)\), or equivalently \[ \phi^*(B) X_t = Z_t, \] where \(\phi^*(z)=\phi(z)(1-z)^d\) has a root at \(z=1\) of multiplicity \(d\). For the latter reason, one sometimes refers to ARIMA\((p,d,0)\), \(d\geq 1\), models as AR models (more specifically, AR\((p+d)\) models) with “unit roots”.

Note: The auto.arima function above suggested that the best model is ARIMA\((0,1,0)\) (even allowing for linear drift).

Remark: There are similar definitions for seasonal unit roots, seasonal random walks, etc. These might be discussed later in the course.

Testing for unit roots

Whether a model with a unit root or (possibly trend) stationary model (with nearly unit root) should be used is a statistical question (especially for smaller sample sizes). For example, below are the plots of AR\((1)\) series with \(\phi_1=0.8\) and \(\phi_1=1\) (random walk).

set.seed(123456)
ts_1 <- arima.sim(list(order = c(1,0,0), ar = 0.85), n = 75)
ts_2 <- arima.sim(list(order = c(0,1,0)), n = 75)

par(mfrow = c(1, 2)) 
plot(ts_1, type="l")
plot(ts_2, type="l")

par(mfrow = c(1, 2)) 
acf(ts_1, lag.max=20, ylim=c(-1, 1))
acf(ts_2, lag.max=20, ylim=c(-1, 1))

par(mfrow = c(1, 2)) 
pacf(ts_1, lag.max=20, ylim=c(-1, 1))
pacf(ts_2, lag.max=20, ylim=c(-1, 1))

The confusion is even more delicate in the presence of a trend. E.g. trend stationary (which looks more nonstationary) can be confused easier with random walk.

There are a number of statistical tests to help one decide between the two models. One of the best known is the (augmented) Dickey-Fuller (DF or ADF) test. For example, a typical hypotheses testing problem is:

\(H_0:\) \(\{X_t\}\) is an AR(\(p\)) series with a unit root (that is, \(\phi(1)=0\))
\(H_1:\) \(\{X_t\}\) is a stationary AR(\(p\)) series (that is, \(\phi(1)\neq 0\))

More generally, a (linear) drift or a more general trend can be added to \(\{X_t\}\). The model to be tested is written, in general, as \[ X_t = \beta_0 + \beta_1 t + \beta_2 \frac{t^2}{2} + \phi_1 X_{t-1} + \ldots + \phi_p X_{t-p} + Z_t \] or \[ \Delta X_t = \beta_1 + \beta_2 t + \pi X_{t-1} + \sum_{j=1}^k \gamma_j \Delta X_{t-j} + Z_t \quad (k=p-1) \] with the unit root corresponding to \(\pi = \phi_1 + \phi_2+\ldots+\phi_p-1= 0\) and the null hypothesis to \(H_0: \pi = 0\); \(\beta_2\neq 0\) corresponding to a trend, and \(\beta_1\neq 0,\beta_2=0\) corresponding to a drift.

The test is based on the usual regression but the critical values are no longer those of a \(t\) distribution. (Why?)

ts_model <- auto.arima(ts,max.p=5,max.q=0,max.d=0,ic="aic")
ts_model

## Series: ts 
## ARIMA(2,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1      ar2  intercept
##       1.0436  -0.2495   579.0473
## s.e.  0.0983   0.1008     0.3319
## 
## sigma^2 estimated as 0.4939:  log likelihood=-103.63
## AIC=215.27   AICc=215.7   BIC=225.61

abs(polyroot(c(1,-ts_model$coef[1],-ts_model$coef[2])))

## [1] 1.486430 2.696429

library(urca)
# See Pfaff's book in the references below

ur.no <- ur.df(ts, lags=1, type='none')
summary(ur.no)

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9351 -0.5560 -0.0311  0.4486  2.0002 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## z.lag.1    -3.405e-05  1.295e-04  -0.263    0.793
## z.diff.lag  1.319e-01  1.001e-01   1.318    0.191
## 
## Residual standard error: 0.7344 on 94 degrees of freedom
## Multiple R-squared:  0.01891,    Adjusted R-squared:  -0.001967 
## F-statistic: 0.9058 on 2 and 94 DF,  p-value: 0.4077
## 
## 
## Value of test-statistic is: -0.263 
## 
## Critical values for test statistics: 
##      1pct  5pct 10pct
## tau1 -2.6 -1.95 -1.61

plot(ur.no)

ur.tr <- ur.df(ts, lags=1, type='drift')
summary(ur.tr)

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression drift 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.71212 -0.42575  0.00349  0.43147  1.69043 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 124.94994   32.06259   3.897 0.000183 ***
## z.lag.1      -0.21584    0.05538  -3.898 0.000183 ***
## z.diff.lag    0.23757    0.09714   2.446 0.016337 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6846 on 93 degrees of freedom
## Multiple R-squared:  0.156,  Adjusted R-squared:  0.1379 
## F-statistic: 8.596 on 2 and 93 DF,  p-value: 0.0003754
## 
## 
## Value of test-statistic is: -3.8977 7.6333 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau2 -3.51 -2.89 -2.58
## phi1  6.70  4.71  3.86

plot(ur.tr)

Unit root testing flowchart from Pfaff (2008) is given below. Though the steps are nearly impossible to understand just by reading Pfaff (2008). See instead Enders (1995), Chapter 4, Section 7.

library(urca)

ur.no <- ur.df(ts, lags=1, type='trend')
summary(ur.no)

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression trend 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.6029 -0.4416  0.0248  0.3907  1.6720 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 161.785553  38.979026   4.151  7.4e-05 ***
## z.lag.1      -0.279036   0.067172  -4.154  7.3e-05 ***
## tt           -0.004999   0.003063  -1.632  0.10609    
## z.diff.lag    0.278779   0.099536   2.801  0.00621 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6785 on 92 degrees of freedom
## Multiple R-squared:  0.1798, Adjusted R-squared:  0.153 
## F-statistic: 6.721 on 3 and 92 DF,  p-value: 0.0003764
## 
## 
## Value of test-statistic is: -4.1541 6.0678 9.0636 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau3 -4.04 -3.45 -3.15
## phi2  6.50  4.88  4.16
## phi3  8.73  6.49  5.47

Note: What is the conclusion for the Lake Huron time series?

Note: There are tests for unit roots in moving average part of ARMA model. (What do they serve?)

Some final notes

As seen from above, I used the book by Pfaff (2008) for R illustrations and, in particular, the R package “urca”. But there are many more books on the subject of testing for unit roots (sometimes also in connection to cointegration and regime switching). In fact, much of the Econometrics literature in the 80-90s was driven by anything unit root. See also Maddala and Kim (1998), Choi (2015). There are also other R packages for unit roots, including “fUnitRoots”, “uroot”, “tseries”.

Questions/Homework

Q0: Supplement your analysis of time series data from Q1, Lecture I, by unit root tests if seems appropriate.

Q1:* What is a connection between the auto.arima selected model and the model suggested by ADF test? Why was there a contradiction for the lake Huron data above?

Q2: Prove the asymptotics for the ADF statistic under the null (in one of the trend/drift or no trend/drift cases).

Q3: What are other tests besides ADF? What (dis)advantages they have?

Q4: What are fractionally differenced series?

Q5: Can you think of a situation/model that would “break” the unit root testing flowchart?

References

Choi, I. 2015. Almost All About Unit Roots. Themes in Modern Econometrics. Cambridge University Press, New York.

Enders, W. 1995. Applied Econometric Time Series. John Wiley; Sons.

Maddala, G. S., and I.-M. Kim. 1998. Unit Roots, Cointegration and Structural Change. Themes in Modern Econometrics. Cambridge University Press, Cambridge.

Pfaff, B. 2008. Analysis of Integrated and Cointegrated Time Series with R. Second. Use R! Springer, New York.