Multivariate III: Cointegration and Related Matters

What is this all about?

This is about using regression on several non-stationary (integrated, stochastic trend) series, or/and finding relationships among such series. We shall also go back to “lower dimension” for the time being.

Punchline: Using regression on stochastic trend series themselves has issues. Doing this on their stationary increment series is better, but more interesting, the so-called cointegrating relationships are even more revealing and more powerful.

Motivation and Inspiration

Motivating data example: UK consumption/income/wealth

library(tseries)
library(urca)

data(Raotbl3)
attach(Raotbl3)

#   lc = log consumption expenditure
#   li = log income
#   lw = log wealth
lc <- ts(lc, start=c(1966,4), end=c(1991,2), frequency=4)
li <- ts(li, start=c(1966,4), end=c(1991,2), frequency=4)
lw <- ts(lw, start=c(1966,4), end=c(1991,2), frequency=4)

# restricting as considered in Pfaff, Ch. 7
ukcons <- window(cbind(lc, li, lw), start=c(1967, 2), end=c(1991,2))

ts.plot(ukcons,main="Quarterly UK log consumption/income/wealth",xlab="Year",lty=c(1,2,3))
legend('right',c('consumption','income','wealth'),lty=c(1,2,3))

par(mfrow = c(1, 3))
ts.plot(lc,main="Quarterly UK log consumption",xlab="Year")
ts.plot(li,main="Quarterly UK log income",xlab="Year")
ts.plot(lw,main="Quarterly UK log wealth",xlab="Year")

library(urca)
 
#ur.lc <- ur.df(lc, lags=3, type='trend')
#ur.lc <- ur.df(lc, lags=3, type='drift')
#summary(ur.lc)
#plot(ur.lc)

#ur.li <- ur.df(li, lags=0, type='trend')
#ur.li <- ur.df(li, lags=0, type='drift')
#summary(ur.li)
#plot(ur.li)

#ur.lw <- ur.df(lw, lags=0, type='trend')
#ur.lw <- ur.df(lw, lags=0, type='drift')
ur.lw <- ur.df(lw, lags=0, type='none')
summary(ur.lw)

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.158325 -0.020807  0.005545  0.024886  0.088494 
## 
## Coefficients:
##          Estimate Std. Error t value Pr(>|t|)  
## z.lag.1 0.0006217  0.0003254   1.911    0.059 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04231 on 97 degrees of freedom
## Multiple R-squared:  0.03627,    Adjusted R-squared:  0.02633 
## F-statistic:  3.65 on 1 and 97 DF,  p-value: 0.05902
## 
## 
## Value of test-statistic is: 1.9106 
## 
## Critical values for test statistics: 
##      1pct  5pct 10pct
## tau1 -2.6 -1.95 -1.61

plot(ur.lw)

Conclusions following the unit root testing flowchart from Lecture Univariate II:

log consumption (lc): the model \(\Delta X_t = \beta_1 \Delta X_{t-1} + \beta_2 \Delta X_{t-2} + \beta_3 \Delta X_{t-3} + Z_t\) with \(\{Z_t\}\sim\mbox{WN}(0,\sigma^2_Z)\)

log income (li): the same model \(\Delta X_t = Z_t\) with \(\{Z_t\}\sim\mbox{WN}(0,\sigma^2_Z)\) (this is at 1%).

log wealth (lw): the model \(\Delta X_t = Z_t\) with \(\{Z_t\}\sim\mbox{WN}(0,\sigma^2_Z)\)

summary(lm(lc ~ li, data=ukcons))

## 
## Call:
## lm(formula = lc ~ li, data = ukcons)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.065902 -0.016474  0.001461  0.017803  0.057107 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.19799    0.14997   -1.32     0.19    
## li           1.00894    0.01376   73.33   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.024 on 95 degrees of freedom
## Multiple R-squared:  0.9826, Adjusted R-squared:  0.9825 
## F-statistic:  5377 on 1 and 95 DF,  p-value: < 2.2e-16

summary(lm(li ~ lw, data=ukcons))

## 
## Call:
## lm(formula = li ~ lw, data = ukcons)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.27407 -0.06104  0.05010  0.08426  0.19018 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.52455    0.54894   10.06  < 2e-16 ***
## lw           0.40887    0.04176    9.79 4.64e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1263 on 95 degrees of freedom
## Multiple R-squared:  0.5022, Adjusted R-squared:  0.497 
## F-statistic: 95.85 on 1 and 95 DF,  p-value: 4.636e-16

Spurious regression

library(lmtest)

set.seed(123456)
e1 <- rnorm(500)
e2 <- rnorm(500)
#trd <- 1:500
# y1 <- 0.8*trd + cumsum(e1)
# y2 <- 0.6*trd + cumsum(e2)
y1 <- cumsum(e1)
y2 <- cumsum(e2)

par(mfrow = c(1, 2))
ts.plot(y1)
ts.plot(y2)

sr.reg <- lm(y1 ~ y2)
sr.dw <- dwtest(sr.reg)$statistic

summary(sr.reg)

## 
## Call:
## lm(formula = y1 ~ y2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.5789  -2.3936  -0.2456   2.1271   9.7228 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.99093    0.26909   40.84   <2e-16 ***
## y2          -0.53774    0.01809  -29.73   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.434 on 498 degrees of freedom
## Multiple R-squared:  0.6395, Adjusted R-squared:  0.6388 
## F-statistic: 883.6 on 1 and 498 DF,  p-value: < 2.2e-16

sr.dw

##        DW 
## 0.1021637

# Small Durbin-Watson statistic; see e.g. https://en.wikipedia.org/wiki/Durbin–Watson_statistic
# Spurious regression is also known as the one in which R^2 > DW

For example, the following result is known (Hamilton (1994), p. 560). Consider a bivariate time series \(\mathbf{Y} = \{(Y_{1t},Y_{2t})'\}_{t\in\mathbb{Z}}\) where \(\{Y_{1t}\}_{t\in\mathbb{Z}}\) and \(\{Y_{2t}\}_{t\in\mathbb{Z}}\) are two independent random walks defined through \[\begin{eqnarray} Y_{1t} & = & Y_{1,t-1} + \epsilon_{1t},\\ Y_{2t} & = & Y_{2,t-1} + \epsilon_{2t}, \end{eqnarray}\]

where \(\{\epsilon_{1t}\}_{t\in\mathbb{Z}}\) and \(\{\epsilon_{2t}\}_{t\in\mathbb{Z}}\) are independent series of i.i.d. variables having means \(0\) and variances \(\sigma^2_1\) and \(\sigma^2_2\), respectively. Then, the OLS estimators \(\widehat \alpha_T\) and \(\widehat \gamma_T\) in the regression \(Y_{1t} = \alpha + \gamma Y_{2t} + u_t\) can be shown to satisfy, for large sample size \(T\), \[ \left( \begin{array}{c} T^{-1/2}\widehat \alpha_T\\ \widehat \gamma_T \end{array} \right) \approx \left( \begin{array}{c} \sigma_1 \cdot h_1\\ \frac{\sigma_1}{\sigma_2}\cdot h_2 \end{array} \right), \] where \(h_1\) and \(h_2\) are certain non-Gaussian random variables (that do not depend on \(\sigma_1^2\) or \(\sigma_2^2\)).

Some remedies for spurious regression, and other observations

One “obvious” remedy is to consider stationary increments. For others (e.g. including lagged variables): see Hamilton (1994), p. 561.

summary(lm(diff(lc) ~ diff(li), data=ukcons))

## 
## Call:
## lm(formula = diff(lc) ~ diff(li), data = ukcons)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.047274 -0.006248  0.001609  0.006785  0.047421 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.004858   0.001424   3.411 0.000957 ***
## diff(li)    0.244151   0.073899   3.304 0.001350 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01312 on 94 degrees of freedom
## Multiple R-squared:  0.104,  Adjusted R-squared:  0.09451 
## F-statistic: 10.92 on 1 and 94 DF,  p-value: 0.00135

summary(lm(diff(li) ~ diff(lw), data=ukcons))

## 
## Call:
## lm(formula = diff(li) ~ diff(lw), data = ukcons)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.049859 -0.009938  0.000173  0.010431  0.053542 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 0.006413   0.001898   3.379  0.00106 **
## diff(lw)    0.017723   0.044081   0.402  0.68856   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0183 on 94 degrees of freedom
## Multiple R-squared:  0.001717,   Adjusted R-squared:  -0.008903 
## F-statistic: 0.1616 on 1 and 94 DF,  p-value: 0.6886

Disadvantages of differencing (Pfaff (2008), Ch. 4, p. 75): “First, differencing greatly attenuates large positive residual autocorrelation; hence, false inferences upon the coefficients in the regression equation could be drawn. Second, most economic theories are expressed in levels, and the implications of the long-run relationships between variables are deduced.” (For an explanation of the first disadvantage, see Hamilton (1994), p. 562.) A third disadvantage is that just by looking at differences, one may miss other important, now called cointegrating, relationships.

There is something interesting happening with the residuals of the regressions above.

library(urca)

lc.eq <- summary(lm(lc ~ li, data=ukcons))
lw.eq <- summary(lm(li ~ lw, data=ukcons))

error.lc <- ts(resid(lc.eq), start=c(1967,2), end=c(1991,2), frequency=4)
error.lw <- ts(resid(lw.eq), start=c(1967,2), end=c(1991,2), frequency=4)

par(mfrow = c(1, 2))
ts.plot(error.lc)
ts.plot(error.lw)

par(mfrow = c(1, 2))
pacf(error.lc, lag.max = 20)
pacf(error.lw, lag.max = 20)

ci.lc <- ur.df(error.lc, lags=1, type='none')
ci.lw <- ur.df(error.lw, lags=0, type='none')
#ci.lw <- ur.df(error.lw, lags=0, type='trend')

summary(ci.lc)

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.059909 -0.011226  0.001923  0.012520  0.040189 
## 
## Coefficients:
##            Estimate Std. Error t value Pr(>|t|)    
## z.lag.1    -0.20001    0.07676  -2.606 0.010681 *  
## z.diff.lag -0.36809    0.09652  -3.814 0.000246 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01646 on 93 degrees of freedom
## Multiple R-squared:  0.2717, Adjusted R-squared:  0.2561 
## F-statistic: 17.35 on 2 and 93 DF,  p-value: 3.945e-07
## 
## 
## Value of test-statistic is: -2.6055 
## 
## Critical values for test statistics: 
##      1pct  5pct 10pct
## tau1 -2.6 -1.95 -1.61

summary(ci.lw)

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.046366 -0.011545 -0.000162  0.018536  0.074973 
## 
## Coefficients:
##         Estimate Std. Error t value Pr(>|t|)  
## z.lag.1 -0.03721    0.01994  -1.867   0.0651 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02448 on 95 degrees of freedom
## Multiple R-squared:  0.03537,    Adjusted R-squared:  0.02522 
## F-statistic: 3.484 on 1 and 95 DF,  p-value: 0.06505
## 
## 
## Value of test-statistic is: -1.8665 
## 
## Critical values for test statistics: 
##      1pct  5pct 10pct
## tau1 -2.6 -1.95 -1.61

The test statistics need to be compared to the following critical value (from MacKinnon (2010)):

T <- length(lc)
# N=1, 5%, "nc"
binf <- -1.94100
b1 <- -0.2686
b2 <-  -3.365
b3 <- 31.223

cr_value <- binf  + b1/T + b2/T^2 + b3/T^3
cr_value

## [1] -1.944024

This suggests that a linear combination of lc and li is stationary, but that this is not the case for li and lw. Let us examine the following example where a linear combination of two variables is stationary, and also see how differencing would fair.

Example 3.1: Consider the bivariate series \(\mathbf{Y} = \{(Y_{1t},Y_{2t})'\}_{t\in\mathbb{Z}}\) defined as \[\begin{eqnarray} Y_{1t} & = & \gamma Y_{2,t-1} + \gamma_1 \Delta Y_{2,t-1} + \epsilon_{1t},\\ Y_{2t} & = & Y_{2,t-1} + \epsilon_{2t}, \end{eqnarray}\]

where \(\gamma\), \(\gamma_1\) are parameters parameter, and \(\{\epsilon_{1t}\}_{t\in\mathbb{Z}}\) and \(\{\epsilon_{2t}\}_{t\in\mathbb{Z}}\) are independent series of i.i.d. variables having means \(0\) and variances \(\sigma^2_1\) and \(\sigma^2_2\), respectively. That is, \(\{Y_{2t}\}\) is a random walk, and a linear combination of \(Y_{1t}\) and \(Y_{2t}\) is stationary.

Note: Analogy when \(\gamma=1\), \(\gamma_1=0\): \(Y_{2t}=\) position of a drunkard, and \(Y_{1t}=\) position of his (drunk?) dog on a leash.

set.seed(123456)
n <- 100 # n is T
sigma1 <- 1 
sigma2 <- 1
gamma <- 0.4 
gamma1 <- 0
#gamma <- 1 
y20 <- 3 

y1 = NULL ;
y2 = NULL ;
y2[1] = y20 + rnorm(1,0,sigma2)
y1[1] = y2[1] + rnorm(1,0,sigma1)

for (i in 2:n) {
  y2[i] = y2[i-1] + rnorm(1,0,sigma2)
  y1[i] = gamma*y2[i-1] + gamma1*(y2[i] - y2[i-1]) + rnorm(1,0,sigma1)
}

plot.ts(y1,ylim=c(-8,10))
lines(y2, lty='dotted')

Note: The stationary differences of \(\mathbf{Y} = \{(Y_{1t},Y_{2t})'\}_{t\in\mathbb{Z}}\) do NOT have a VAR representation (Hamilton (1994), p. 573). Note also that the stationary differences of \(\mathbf{Y} = \{(Y_{1t},Y_{2t})'\}_{t\in\mathbb{Z}}\) satisfy \[\begin{eqnarray} \Delta Y_{1t} & = & (\gamma Y_{2,t-1} - Y_{1,t-1}) + \gamma_1 \Delta Y_{2,t-1} + \epsilon_{1t},\\ \Delta Y_{2t} & = & \epsilon_{2t}. \end{eqnarray}\]

where \(\gamma Y_{2,t-1} - Y_{1,t-1}\) is stationary.

# Differenced series
dy1 = ts(diff(y1))
dy2 = ts(diff(y2))

# Combine for later
series <- window(cbind(y1, y2))
Vt <- window(cbind(dy1,dy2))

# Plot series
ts.plot(series,main="Both time series",xlab="Time",lty=c(1,2))
legend('topleft',c('y1','y2'),lty=c(1,2))

# Plot differenced series
ts.plot(Vt,main="Both differenced time series",xlab="Time",lty=c(1,2))
legend('topleft',c('dy1','dy2'),lty=c(1,2))

###################### MODEL 1 ########################
# Simple VAR approach

library(vars)
VARselect(Vt, lag.max=4, type="const")$selection

## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      4      2      2      4

mod1 <- VAR(Vt, p=2, type="none")


###################### MODEL 2 ########################
# Cointegration approach: Two-step Engle-Granger procedure

# Step 1: regress variables and get error term
lr.reg <- lm(y1 ~ 0+y2)         #long-run equation
summary(lr.reg)                 #estimate should be close to gamma

## 
## Call:
## lm(formula = y1 ~ 0 + y2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.5503 -0.5182  0.1684  0.9158  2.3794 
## 
## Coefficients:
##    Estimate Std. Error t value Pr(>|t|)    
## y2  0.38860    0.01818   21.38   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.053 on 99 degrees of freedom
## Multiple R-squared:  0.8219, Adjusted R-squared:  0.8201 
## F-statistic:   457 on 1 and 99 DF,  p-value: < 2.2e-16

error <- ts(residuals(lr.reg))  #error terms

#Check: error ought to be stationary
ts.plot(error)

# Step 2: create ECM model
lerror <- error[-c(n-1,n)] #lagged error
ldy1 <- dy1[-(n-1)] #lagged differences
ldy2 <- dy2[-(n-1)] 

# Regress on own history, lagged changes of other variable and regression error
mod2.y1 <- lm(dy1[-1] ~ lerror + ldy1 + ldy2)
mod2.y2 <- lm(dy2[-1] ~ lerror + ldy1 + ldy2)

##################### Comparison of methods #######################

### Comparison 1: Fit model to first half of data, 
#                 then compute 1-step ahead
#                 predictions for second half of data

# Generate next n observations:
for (i in (n+1):(2*n)) {
  y2[i] <- y2[i-1] + rnorm(1,0,sigma2)
  y1[i] <- gamma*y2[i] + rnorm(1,0,sigma1)
}
dy1 <- ts(diff(y1))
dy2 <- ts(diff(y2))
Vt <- window(cbind(dy1,dy2))

# For cointegration model,need errors again
error.new <- ts(residuals(lm(y1 ~ 0+y2)))

# Forecast 1-step-ahead predictions
mod1.pred1 = NULL
mod2.pred1 = NULL
for (i in (n+1):(2*n-1)) {
  mod1.pred1[i] = ( Bcoef(mod1)[,1:2] %*% matrix(Vt[i-1,]) + Bcoef(mod1)[,3:4] %*% matrix(Vt[i-2,]))[1]
  mod2.pred1[i] = mod2.y1$coefficients %*% c(1,error.new[i-1],dy1[i-1],dy2[i-1])
}

preds1 <- window(cbind(dy1, mod1.pred1[1:(2*n-1)],mod2.pred1[1:(2*n-1)]))
ts.plot(preds1,lty=c(1,2,3),lwd = c(1,2,2),main="Comparison of models",xlim=c(n-2,2*n),col=c("black","red","blue"))
legend('topleft',c('dy1','VAR prediction','ECM prediction'),lty=c(1,2,3),lwd = c(1,2,2),col=c("black","red","blue"))

# Sum of squared prediction error
sum((mod1.pred1[(n+1):(2*n-1)] - dy1[(n+1):(2*n-1)])^2)

## [1] 140.7708

sum((mod2.pred1[(n+1):(2*n-1)] - dy1[(n+1):(2*n-1)])^2)

## [1] 101.7806

### Comparison 2: Plot 1-step ahead predictions at each step

mod1.pred2 = NULL ;
mod2.pred2 = NULL ;
for (i in 3:n) {
  mod1.pred2[i] = ( Bcoef(mod1)[,1:2] %*% matrix(Vt[i-1,]) + Bcoef(mod1)[,3:4] %*% matrix(Vt[i-2,]))[1]
  mod2.pred2[i] = mod2.y1$coefficients %*% c(1,error[i-1],dy1[i],dy2[i])
}

plot(2:n,dy1[1:n-1],lty=1,main="Comparison of models",xlab="Time",ylab="dy1")
lines(2:n,dy1[1:n-1])
points(3:n,mod1.pred2[3:n],pch=2,col="red")
points(3:n,mod2.pred2[3:n],pch=3,col="blue")
legend('bottomright',c('dy1','VAR prediction','ECM prediction'),pch=c(1,2,3),col=c("black","red","blue"))
grid()

# Sum of squared error
sum((mod1.pred2[3:n] - dy1[2:(n-1)])^2)

## [1] 371.9602

sum((mod2.pred2[3:n] - dy1[2:(n-1)])^2)

## [1] 100.4467

Data/Theory goals of the lecture

The more theoretical goals are to lay a basic foundation for the phenomenon of cointegration, and to explain what basic models are being used. We shall then come back to the data set of UK consumption/income/wealth to fit such a model.

Cointegration

Definition 3.1: The components series \(\{X_{1t}\},\ldots,\{X_{dt}\}\) of the vector series \(\{\mathbf{X}_t\}\) are said to be cointegrated if (a) all components of \(\mathbf{X}_t\) are \(I(1)\) (that is, integrated of order \(1\)) and (b) a vector \(\mathbf{\alpha}\neq 0\) exists so that \(Z_t = \mathbf{\alpha}'\mathbf{X}_t\) is \(I(0)\) (that is, stationary). The vector \(\mathbf{\alpha}\) is called the cointegrating vector. The number \(r\) of linearly independent cointegrating vectors is called cointegrating rank of the vector series \(\{\mathbf{X}_t\}\).

Notes

This definition can be extended to the series whose components are \(I(c)\) for arbitrary \(c\geq 1\), requiring a linear combination to be \(I(c')\) with \(c'<c\).
Deterministic trend components can also be included in the definition and subsequent models.

In the data example above, the UK log consumption and income series would be called cointegrated.

Vector Error Correction Models (VECMs)

In the discussion concerning the bivariate series \(\mathbf{Y}_t = (Y_{1t},Y_{2t})'\) in Example 3.1 above, it proved useful to fit the model (after changing the notation from \(\mathbf{Y}_t\) to \(\mathbf{X}_t\)) \[ \Delta \mathbf{X}_t = A Z_{t-1} + \Gamma \Delta \mathbf{X}_{t-1} + \mathbf{U}_t, \] where \(A\) is a \(2\times 1\) vector, \(\Gamma\) is a \(2\times 2\) matrix, \(\{\mathbf{U}_t\}\sim \mbox{WN}(0,\Sigma_{\mathbf U})\) and \(Z_t = \alpha' \mathbf{X}_t\) is thought as stationary. In this relation, \(\alpha' \mathbf{X}_t = \alpha_1 X_{1t} + \alpha_2 X_{2t}=0\) is viewed as a long-run equilibrium between the series \(\{X_{1t}\}\) and \(\{X_{2t}\}\). The term \(AZ_{t-1}\) is viewed as an error-correction term (correcting the errors \(\mathbf{U}_t\)) that determines explicitly how \(X_{1t}\) and \(X_{2t}\) react to deviations from the long-run equilibrium.

Note that the model above can also be expressed as \[ \Delta \mathbf{X}_t = \Pi \mathbf{X}_{t-1} + \Gamma \Delta \mathbf{X}_{t-1} + \mathbf{U}_t, \] where \(\Pi = A\alpha'\) is a \(2\times 2\) matrix having rank \(1\). This model is generalized to the following vector error correction model (VECM).

Definition 3.2: A vector error correction model, VECM for short, is expressed as either \[ \Delta \mathbf{X}_t = \Pi \mathbf{X}_{t-p} + \Gamma_1 \Delta \mathbf{X}_{t-1} + \ldots + \Gamma_{p-1} \Delta \mathbf{X}_{t-p+1} + \mathbf{U}_t \] in the so-called long-run form, or \[ \Delta \mathbf{X}_t = \Pi \mathbf{X}_{t-1} + \widetilde \Gamma_1 \Delta \mathbf{X}_{t-1} + \ldots + \widetilde \Gamma_{p-1} \Delta \mathbf{X}_{t-p+1} + \mathbf{U}_t \] in the so-called transitory form. (Note that these models can also be written as possibly non-stable VAR\((p)\) models.)

Three cases of matrices \(\Pi\) are considered:

\(\mbox{rk}(\Pi) = d\) [stationary; no cointegration]
\(\mbox{rk}(\Pi) = 0\) [diffs stationary; no cointegration]
\(0<\mbox{rk}(\Pi) < d\) cointegration

The case (i) corresponds to stationary VAR series \(\mathbf{X}\). The case (ii) corresponds to a VAR-model in first differences. The interesting case is the third one. It can be shown to correspond to the component series of \(\mathbf{X}\) being cointegrated, with the cointegrating rank \(r = \mbox{rk}(\Pi)\).

There are several test to estimate the cointegrating rank (through the matrix rank \(\mbox{rk}(\Pi)\)) and the cointegrating vectors. Here is a simulated data example from Pfaff (2008).

library(urca)

set.seed(12345)
e1 <- rnorm(250, 0, 0.5)
e2 <- rnorm(250, 0, 0.5)
e3 <- rnorm(250, 0, 0.5)
u1.ar1 <- arima.sim(model = list(ar = 0.75),
                    innov = e1, n = 250)
u2.ar1 <- arima.sim(model = list(ar = 0.3),
                    innov = e2, n = 250)
y3 <- cumsum(e3)
y1 <- 0.8 * y3 + u1.ar1
y2 <- -0.3 * y3 + u2.ar1
y.mat <- data.frame(y1, y2, y3)

vecm <- ca.jo(y.mat)
jo.results <- summary(vecm)
jo.results

## 
## ###################### 
## # Johansen-Procedure # 
## ###################### 
## 
## Test type: maximal eigenvalue statistic (lambda max) , with linear trend 
## 
## Eigenvalues (lambda):
## [1] 0.27036254 0.15474942 0.01884032
## 
## Values of teststatistic and critical values of test:
## 
##           test 10pct  5pct  1pct
## r <= 2 |  4.72  6.50  8.18 11.65
## r <= 1 | 41.69 12.91 14.90 19.19
## r = 0  | 78.17 18.90 21.07 25.75
## 
## Eigenvectors, normalised to first column:
## (These are the cointegration relations)
## 
##           y1.l2      y2.l2     y3.l2
## y1.l2  1.000000  1.0000000 1.0000000
## y2.l2 -4.732436  0.2273774 0.1513858
## y3.l2 -2.129850 -0.6657324 2.3153224
## 
## Weights W:
## (This is the loading matrix)
## 
##             y1.l2       y2.l2        y3.l2
## y1.d -0.034235501 -0.29705774 -0.008294582
## y2.d  0.145988517 -0.08137695  0.003335110
## y3.d  0.002429191  0.01025326 -0.010523045

vecm.r2 <- cajorls(vecm, r = 2)
vecm.r2

## $rlm
## 
## Call:
## lm(formula = substitute(form1), data = data.mat)
## 
## Coefficients:
##           y1.d       y2.d       y3.d     
## ect1      -0.331293   0.064612   0.012682
## ect2       0.094473  -0.709385  -0.009165
## constant   0.168371  -0.027019   0.025255
## y1.dl1    -0.227677   0.027012   0.068158
## y2.dl1     0.144452  -0.715607   0.040487
## y3.dl1     0.123467  -0.290828  -0.075251
## 
## 
## $beta
##             ect1      ect2
## y1.l2  1.0000000 0.0000000
## y2.l2  0.0000000 1.0000000
## y3.l2 -0.7328534 0.2951962

#class(jo.results)
#slotNames(jo.results)

Back to data example: UK consumption/income/wealth

vecm.uk <- ca.jo(ukcons)
jo.results.uk <- summary(vecm.uk)
jo.results.uk

## 
## ###################### 
## # Johansen-Procedure # 
## ###################### 
## 
## Test type: maximal eigenvalue statistic (lambda max) , with linear trend 
## 
## Eigenvalues (lambda):
## [1] 0.22714805 0.04881869 0.00419976
## 
## Values of teststatistic and critical values of test:
## 
##           test 10pct  5pct  1pct
## r <= 2 |  0.40  6.50  8.18 11.65
## r <= 1 |  4.75 12.91 14.90 19.19
## r = 0  | 24.48 18.90 21.07 25.75
## 
## Eigenvectors, normalised to first column:
## (These are the cointegration relations)
## 
##             lc.l2      li.l2      lw.l2
## lc.l2  1.00000000  1.0000000  1.0000000
## li.l2 -0.93033927 -0.3751971 -1.0868675
## lw.l2 -0.06256788 -0.3657988 -0.2263311
## 
## Weights W:
## (This is the loading matrix)
## 
##          lc.l2       li.l2         lw.l2
## lc.d 0.1025904 -0.03008945 -0.0071639639
## li.d 0.5783467 -0.01519047  0.0003830183
## lw.d 0.3317454  0.08143889 -0.0300075356

vecm.r1 <- cajorls(vecm.uk, r = 1)
vecm.r1

## $rlm
## 
## Call:
## lm(formula = substitute(form1), data = data.mat)
## 
## Coefficients:
##           lc.d      li.d      lw.d    
## ect1       0.10259   0.57835   0.33175
## constant   0.02262   0.10149   0.06064
## lc.dl1    -0.21064   0.41451   0.01562
## li.dl1     0.18167  -0.55146  -0.10535
## lw.dl1     0.07722   0.01799   0.18509
## 
## 
## $beta
##              ect1
## lc.l2  1.00000000
## li.l2 -0.93033927
## lw.l2 -0.06256788

Example of US income/consumption

The following example was considered in Multivariate I lecture. Are the cumulative sums of income/consumption cointegrated?

library(fpp)
data(usconsumption)

in2 <- cumsum(usconsumption[,1])
co2 <- cumsum(usconsumption[,2])

us2 <- data.frame(in2,co2)

vecm.us2 <- ca.jo(us2)
jo.results.us2 <- summary(vecm.us2)
jo.results.us2

## 
## ###################### 
## # Johansen-Procedure # 
## ###################### 
## 
## Test type: maximal eigenvalue statistic (lambda max) , with linear trend 
## 
## Eigenvalues (lambda):
## [1] 0.09429336 0.01005329
## 
## Values of teststatistic and critical values of test:
## 
##           test 10pct  5pct  1pct
## r <= 1 |  1.64  6.50  8.18 11.65
## r = 0  | 16.04 12.91 14.90 19.19
## 
## Eigenvectors, normalised to first column:
## (These are the cointegration relations)
## 
##           in2.l2     co2.l2
## in2.l2  1.000000  1.0000000
## co2.l2 -1.074504 -0.9064512
## 
## Weights W:
## (This is the loading matrix)
## 
##           in2.l2       co2.l2
## in2.d 0.02978937 -0.011096130
## co2.d 0.19524476 -0.003174373

vecm.us2.r1 <- cajorls(vecm.us2, r = 1)
vecm.us2.r1

## $rlm
## 
## Call:
## lm(formula = substitute(form1), data = data.mat)
## 
## Coefficients:
##           in2.d     co2.d   
## ect1       0.02979   0.19524
## constant   0.58768   1.28709
## in2.dl1    0.31202   0.58639
## co2.dl1    0.06219  -0.36227
## 
## 
## $beta
##             ect1
## in2.l2  1.000000
## co2.l2 -1.074504

Question: Which model would your prefer?

Some final notes

Cointegration can be interpreted in the spectral domain: the cointegration rank \(r\) turns out to be \(d-\mbox{rk}(\Gamma_{\Delta \mathbf{X}}(0))\), where \(\Gamma_{\Delta \mathbf{X}}(w)\) is the spectral density matrix of \(\Delta \mathbf{X}\) at frequency \(w\).
Most of introductory econometrics and econometric time series textbooks cover the basics of cointegration, e.g. Hamilton (1994), Lutkepohl (2005) and others. (A number of economic theories concern hypothesis about cointegration.) Several textbooks are dedicated specifically to cointegration, e.g. Banerjee et al. (1993), Johansen (1995), Hansen and Johansen (1998). Many of the R examples used above are taken from Pfaff (2008). The “drunk and her dog” analogy is discussed in Murray (1994).
Cointegration in “high dimension” is considered in Croux and Wilms (2015), Koo et al. (2016), etc.
My own research interests related to cointegration concern(ed) matrix rank estimation and the so-called fractional cointegration.

Questions/Homework

Q0: Supplement your analysis of multivariate time series with VECM modeling if seems appropriate. (Even if the series are stationary, are cumulative series cointegrated?)

Q1: Show that the OLS estimators in the regression of independent random walks indeed have the asymptotics stated above.

Q2: Show that in the VECM, the cointegration rank is indeed the rank of the matrix \(\Pi\) (when the latter is strictly between \(0\) and \(d\)).

Q3: Explain how the Johansen eigenvalue test statistic for cointegration is obtained and prove its asymptotics.

Q4: Interpret cointegration in the spectral domain as mentioned above.

References

Banerjee, A., J. J. Dolado, J. W. Galbraith, and D. Hendry. 1993. Co-Integration, Error Correction, and the Econometric Analysis of Non-Stationary Data. Oxford University Press.

Croux, C., and I. Wilms. 2015. “Sparse Cointegration.” Preprint.

Hamilton, J. D. 1994. Time Series Analysis. Princeton University Press, Princeton, NJ.

Hansen, P. R., and S. Johansen. 1998. Workbook on Cointegration. Oxford University Press.

Johansen, S. 1995. Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford University Press.

Koo, B., H. M. Anderson, M. H. Seo, and W. Yao. 2016. “High-Dimensional Predictive Regression in the Presence of Cointegration.” Preprint.

Lütkepohl, H. 2005. New Introduction to Multiple Time Series Analysis. Springer-Verlag, Berlin.

MacKinnon, J. G. 2010. “Critical Values for Cointegration Tests.” Queen’s Economics Department Working Paper.

Murray, M. P. 1994. “A Drunk and Her Dog: An Illustration of Cointegration and Error Correction.” The American Statistician 48 (1): 37–39.

Pfaff, B. 2008. Analysis of Integrated and Cointegrated Time Series with R. Second. Use R! Springer, New York.