Multivariate I: Vector Autoregressions (VARs)

What is this all about?

This is about regressing stationary series on its own lags, and other related series and their lags. Including other series (variables) might be natural from the application standpoint, lead to more accurate forecasts, clarify relationships between temporally evolving variables, and help in other tasks.

Vector autoregressions (VARs) are canonical models for multivariate (multiple, vector) stationary series. The goal here is to review basic facts about VARs, and illustrate VARs through applications. Some general facts about multivariate time series will also be discussed.

There will be many familiar themes (ACVF, spectrum, autoregression, etc.) from the univariate case but some emphasis will also be on what is different/new.

Setting

Suppose given a multivariate time series \(\mathbf{X}_t = (X_{1,t},\ldots,X_{d,t})'\), \(t=1,\ldots,T\), of dimension \(d\), where the component series \(X_{j,t}\), \(j=1,\ldots,d\), are univariate. We focus here on the component series that “look” stationary.

Motivating example: US quarterly income/consumption

library(vars)
library(fpp)

# Percentage changes in quarterly personal consumption expenditure and personal disposable income 
# for the US, 1970 to 2010

data(usconsumption)
plot(usconsumption)

par(mar=c(3,3,3,3))
acf(usconsumption,lag.max = 50, type = "correlation", plot = TRUE)

cor(usconsumption[,1],usconsumption[,2])

## [1] 0.4320562

cor(usconsumption[1:163,1],usconsumption[2:164,2])

## [1] 0.315334

cor(usconsumption[1:162,1],usconsumption[3:164,2])

## [1] 0.1085605

cor(usconsumption[1:161,1],usconsumption[4:164,2])

## [1] 0.3194669

cor(usconsumption[1:163,2],usconsumption[2:164,1])

## [1] 0.2457187

cor(usconsumption[1:162,2],usconsumption[3:164,1])

## [1] 0.08131708

cor(usconsumption[1:161,2],usconsumption[4:164,1])

## [1] 0.1492798

par(mar=c(3,3,3,3))
acf(usconsumption,lag.max = 50, type = "partial", plot = TRUE)

var1 <- VAR(usconsumption, p=1, type="const")
var1$varresult

## $consumption
## 
## Call:
## lm(formula = y ~ -1 + ., data = datamat)
## 
## Coefficients:
## consumption.l1       income.l1           const  
##        0.30891         0.08267         0.46203  
## 
## 
## $income
## 
## Call:
## lm(formula = y ~ -1 + ., data = datamat)
## 
## Coefficients:
## consumption.l1       income.l1           const  
##         0.5633         -0.2316          0.4841

var2 <- VAR(usconsumption, p=2, type="const")
var2$varresult

## $consumption
## 
## Call:
## lm(formula = y ~ -1 + ., data = datamat)
## 
## Coefficients:
## consumption.l1       income.l1  consumption.l2       income.l2  
##        0.27170         0.03429         0.24275        -0.06504  
##          const  
##        0.39221  
## 
## 
## $income
## 
## Call:
## lm(formula = y ~ -1 + ., data = datamat)
## 
## Coefficients:
## consumption.l1       income.l1  consumption.l2       income.l2  
##        0.56241        -0.24975         0.07712        -0.04247  
##          const  
##        0.46505

ts_spec <-  spec.pgram(usconsumption,spans=5,log = "no",taper=0,pad=0,fast=FALSE,demean=TRUE,detrend=FALSE,plot=FALSE)

par(mfrow = c(3,1),mar=c(2,2,2,2))
plot(ts_spec)
plot.spec.phase(ts_spec)
plot.spec.coherency(ts_spec)

var <- VAR(usconsumption, p=1, type="const")
var$varresult

## $consumption
## 
## Call:
## lm(formula = y ~ -1 + ., data = datamat)
## 
## Coefficients:
## consumption.l1       income.l1           const  
##        0.30891         0.08267         0.46203  
## 
## 
## $income
## 
## Call:
## lm(formula = y ~ -1 + ., data = datamat)
## 
## Coefficients:
## consumption.l1       income.l1           const  
##         0.5633         -0.2316          0.4841

Data/Theory goals of the lecture

From the data perspective, the goal is to understand what is done with the bivariate series above, a few other analyses that can be carried out and several pitfalls to be aware of. We shall also look into the models used and some of their more theoretical aspects.

Stationarity, spectrum, etc.

The notions of stationarity, ACF, spectrum, spectral density, and others used in the univariate case naturally have multivariate analogues. For example:

Definition 1.1: A \(d\)-vector time series (model) \(\mathbf{X} = \{\mathbf{X}_t\}_{t\in\mathbb{Z}}\) is called stationary if \(E\|\mathbf{X}_t\|^2<\infty\) for all \(t\in\mathbb{Z}\), \[ E \mathbf{X}_t = m,\quad \mbox{for all}\quad t\in\mathbb{Z}, \] that is, it has a constant mean, and \[ \mbox{Cov}(\mathbf{X}_{t+h},\mathbf{X}_t) = E(\mathbf{X}_{t+h} - E\mathbf{X}_{t+h})(\mathbf{X}_{t}-E\mathbf{X}_{t})' = E \mathbf{X}_{t+h} \mathbf{X}_{t}' - mm' = \gamma_{\mathbf{X}}(h),\quad \mbox{for all}\ t,h\in\mathbb{Z}, \] for a matrix-valued function \(\gamma_{\mathbf{X}}\), called the autocovariance function (ACVF) of \(\mathbf{X}\), that is, its covariance between \(\mathbf{X}_t\) and \(\mathbf{X}_{t+h}\) depends only on the lag \(h\). Note that \[ \gamma_{\mathbf{X}}(h) = \Big( \gamma_{\mathbf{X},jk}(h) \Big)_{j,k=1,\ldots,d} \] with e.g. \[ \gamma_{\mathbf{X},12}(h) = E X_{1,{t+h}} X_{2,t} - m_1m_2. \]

Note: Unlike in the univariate case, \(\gamma_{\mathbf{X}}\) is not necessarily symmetric (and one needs to check which convention is used for the definition of ACVF). In fact, we have \[ \gamma_{\mathbf{X}}(-h) = \gamma_{\mathbf{X}}(h)'. \]

ACF matrix function is defined as \[ \rho_{\mathbf{X}}(h) = \Big( \frac{\gamma_{\mathbf{X},jk}(h)}{\sqrt{\gamma_{\mathbf{X},jj}(0)\gamma_{\mathbf{X},kk}(0)}} \Big)_{j,k=1,\ldots,d} \]

The sample ACVF and ACF are defined by replacing expectations by averages. PACF and sample PACF are defined in a way analogous to the univariate case.

Caution about interpreting cross correlations:

One cannot use the usual \(\pm \frac{1.96}{\sqrt{T}}\) bounds to see if cross-correlated.

  set.seed(13)
  x1 <- arima.sim(n = 250, list(ar = c(0.9)), sd = sqrt(1)) 
  x2 <- arima.sim(n = 250, list(ar = c(0.9)), sd = sqrt(1)) 
  
  par(mfrow = c(1,2))
  plot(x1)
  plot(x2)

  x <- cbind(x1,x2) 
  acf(x)

Basic result: For many time series models \(\{X_{1t}\}\) and \(\{X_{2t}\}\) which are independent, one can show that (for fixed \(h\)), \[ \widehat \rho_{12}(h) \approx {\cal N}\Big(0,\frac{1}{T}\sum_h \rho_{11}(h)\rho_{22}(h)\Big). \] Moreover, \(\widehat \rho_{12}(h)\) have a non-trivial dependence across \(h\).

To test if two time series are cross-correlated, one suggests to “prewhiten” them first, and then test for cross-correlation in the residuals.

  set.seed(13)
  x1 <- arima.sim(n = 250, list(ar = c(0.9)), sd = sqrt(1)) 
  x2 <- arima.sim(n = 250, list(ar = c(0.9)), sd = sqrt(1)) 
  
  x1_ar <-arima(x1, order=c(1,0,0), include.mean = FALSE, method = "ML")
  x2_ar <-arima(x2, order=c(1,0,0), include.mean = FALSE, method = "ML")
  
  y1 <- residuals(x1_ar)
  y2 <- residuals(x2_ar)
  
  y <- cbind(y1,y2) 
  
  par(mar=c(3,3,3,3))
  acf(y)

  library(forecast)

  x1 <- usconsumption[,"consumption"]
  x2 <- usconsumption[,"income"] 
  
  
  auto.arima(x1,max.p=5,max.q=5,max.P = 0,max.Q=0,ic="aic",allowmean = TRUE)

## Series: x1 
## ARIMA(2,0,2) with non-zero mean 
## 
## Coefficients:
##          ar1      ar2      ma1     ma2  intercept
##       1.4441  -0.6194  -1.2185  0.5618     0.7551
## s.e.  0.2852   0.2200   0.2621  0.1397     0.0940
## 
## sigma^2 estimated as 0.3936:  log likelihood=-153.9
## AIC=319.8   AICc=320.34   BIC=338.4

  auto.arima(x2,max.p=5,max.q=5,max.P = 0,max.Q=0,ic="aic",allowmean = TRUE)

## Series: x2 
## ARIMA(0,0,0) with non-zero mean 
## 
## Coefficients:
##       intercept
##          0.7366
## s.e.     0.0732
## 
## sigma^2 estimated as 0.8844:  log likelihood=-222.13
## AIC=448.26   AICc=448.33   BIC=454.46

  x1_ar <-arima(x1, order=c(2,0,2), include.mean = TRUE, method = "ML")
  x2_ar <-arima(x2, order=c(0,0,0), include.mean = TRUE, method = "ML")
  
  y1 <- residuals(x1_ar)
  y2 <- residuals(x2_ar)
  
  par(mar=c(3,3,3,3))
  y <- cbind(y1,y2) 
  acf(y)

Definition 1.2: The spectral density of a stationary time series \(\{\mathbf{X}_t\}\) with ACVF \(\gamma_{\mathbf{X}}(h)\) is defined as \[ f_{\mathbf{X}}(w) = \Big( f_{\mathbf{X},jk}(w) \Big)_{j,k=1,\ldots,d} = \frac{1}{2\pi} \sum_{h=-\infty}^\infty e^{ihw} \gamma_{\mathbf{X}}(h),\quad w\in [0,2\pi). \] Since \(\gamma_{\mathbf{X}}(h) \neq \gamma_{\mathbf{X}}(-h)\) in general, the off-diagonal elements of \(f_X(w)\) can be complex-valued. But we have generally \[ f_{\mathbf{X}}(w)^* = f_{\mathbf{X}}(w), \] that is, the spectral density matrix is Hermitian symmetric.

The sample analogue of the spectral density is (up to a constant) the periodogram \[ I(w_k) = \frac{1}{T} \Big( \sum_{t=1}^T \mathbf{X}_t e^{iw_k t} \Big) \Big( \sum_{t=1}^T \mathbf{X}_t e^{iw_k t} \Big)^*,\quad w_k = \frac{2\pi k}{T}, \] which is now a \(d\times d\) complex-valued matrix for each fixed \(w_k\). As in the univariate case, it approximates \((2\pi)f_{\mathbf X}(w_k)\), with similar asymptotic properties, smoothed extensions, and so on also considered.

Example 1.1 (Multivariate white noise): A stationary multivariate time series \(\mathbf{Z} = \{\mathbf Z\}_{t\in\mathbb{Z}}\) is called multivariate white noise if \(E \mathbf{Z}_t = 0\) for all \(t\in\mathbb{Z}\) and \[ \gamma_{\mathbf{Z}}(h) = \left\{ \begin{array}{cl} \Sigma_{\mathbf Z} = E \mathbf{Z}_t \mathbf{Z}_t', & \mbox{if}\ h=0,\\ 0, & \mbox{if}\ h\neq 0. \end{array} \right. \] The notation is \(\{\mathbf{Z}_t\}\sim \mbox{WN}(0,\Sigma_{\mathbf Z})\). Its spectral density matrix is \[ f_{\mathbf{Z}}(w) = \frac{1}{2\pi} \Sigma_{\mathbf Z}. \]

For \(j\neq k\), the quantity \[ f_{\mathbf{X},jk}(w) = \frac{1}{2\pi} \sum_{h=-\infty}^\infty e^{ihw} \gamma_{\mathbf{X},jk}(h) \] is called the cross-spectral density of the component series \(\{X_{jt}\}\) and \(\{X_{kt}\}\). It is complex-valued in general and can be expressed as \[ f_{\mathbf{X},jk}(w) = |f_{\mathbf{X},jk}(w)| \exp\{i\mbox{Arg}(f_{\mathbf{X},jk}(w))\}, \] where \(|f_{\mathbf{X},jk}(w)|\) is the amplitude spectrum and \(\mbox{Arg}(f_{\mathbf{X},jk}(w))\) is the phase spectrum.

Phase spectrum: This quantity is not particularly easy to interpret. Few points to make. First, the phase spectrum is zero (for all component pairs) if and only if \(\gamma_X(h) = \gamma(-h)\) (such time series models are called time-reversible). Second, some understanding of the common shapes of the phase spectra seen can be gained through the following simple example.

Example 1.2: Consider the following bivariate series \[ X_{1t} = Z_{1t},\quad X_{2t} = \phi X_{1,t-d} + Z_{2t}, \] where \(\{(Z_{1t},Z_{2t})'\}\sim {\rm WN}(0,\sigma^2I)\). Then, \[ \gamma_{\mathbf{X},21}(h) = \left\{ \begin{array}{cl} \phi \sigma^2, & h = d, \\ 0, & h\neq d \end{array} \right. \] so that \[ f_{\mathbf{X},21}(w) = \frac{\phi \sigma^2}{2\pi} e^{idw} = \frac{\phi \sigma^2}{2\pi} \exp\Big\{i((dw+\pi) \mbox{mod}\ (2\pi) - \pi )\Big\}. \] Note that the slope \(d\) in the phase spectrum is exactly the “delay” \(d\) in the model. A similar interpretation can be made for the slope at a given frequency where it plays the role of the delay in the component series at that frequency (see Remark 5 in Section 11.6, Brockwell and Davis (2006)).

set.seed(2)
d <- 5
# d<- 1
T <- 100
phi <- 0.7
xx <- rnorm(T+d)
x1 <- xx[(1+d):(T+d)]
x2 <- phi*xx[1:T] + rnorm(T)
#x2 <- phi*xx[1:T] 
x <- cbind(x1,x2)

par(margin=c(3,3,3,3))
acf(x)

x_spec <-  spec.pgram(x,spans=5,log = "no",taper=0,pad=0,fast=FALSE,demean=TRUE,detrend=FALSE,plot=FALSE)

par(mfrow = c(3,1),mar=c(2,2,2,2))
plot(x_spec)
plot.spec.phase(x_spec)
plot.spec.coherency(x_spec)

Squared coherence: For component pair \(j,k\), the squared coherence is defined as \[ {\cal H}_{jk}^2(w) = \frac{|f_{\mathbf{X},jk}(w)|^2}{f_{\mathbf{X},jj}(w)f_{\mathbf{X},kk}(w)}. \] The quantity can be shown to satisfy \(0\leq {\cal H}_{jk}\leq 1\). It can be thought as the squared correlation of the DFT coefficients of the component series at frequency \(w\). More insight can be found in Brockwell and Davis (2006), Section 11.6, for example:

If \(X_{2t} = \sum_j \psi_j X_{1,t-j}\), then \({\cal H}_{21}^2(w) = 1\). Appropriate extensions to \(X_{2t} = \sum_j \psi_j X_{1,t-j}+ Z_t\) can also be analyzed.
If \(\{X_{1t}\}\) and \(\{X_{2t}\}\) have the squared coherence \({\cal H}_{21}^2(w)\), then the same is for \(Y_{1t} = \sum_j \psi_j X_{1,t-j}\) and \(Y_{2t} = \sum_j \phi_j X_{2,t-j}\).
Squared coherence is related to partial correlations in multivariate time series (more later).

Example 1.2 (cont’ed): In Example 1.2 above, we have \[ f_{\mathbf{X},11}(w) = \frac{\sigma^2}{2\pi},\quad f_{\mathbf{X},22}(w) = \frac{\sigma^2 (1+\phi^2)}{2\pi}, \quad f_{\mathbf{X},21}(w) = \frac{\phi \sigma^2}{2\pi} e^{idw} \] and hence \[ {\cal H}_{21}^2(w) = \frac{\phi^2}{1+\phi^2} \in [0,1). \]

Note: \(f_X(w)\) is also non-negative definite (non-obvious) for each \(w\), that is, \[ z^*f_X(w)z\geq 0,\quad \mbox{any}\ z\in\mathbb{C}. \] In the univariate case, \(f_X(w)\geq 0\).

Vector Autoregression (VAR)

Definition 1.3: Vector autoregression time series of order \(p\) (VAR(\(p\))) and dimension \(d\) (with zero mean) satisfies the equation \[ \mathbf{X}_t = \Phi_1 \mathbf{X}_{t-1} + \ldots + \Phi_p \mathbf{X}_{t-p} + \mathbf{Z}_t, \] where \(\Phi_k\), \(k=1,\ldots,p\), are \(p\) matrices of dimension \(d\times d\), and \(\{\mathbf{Z}_t\} \sim \mbox{WN}(0,\Sigma_{\mathbf Z})\). By using the backshift \(B\) notation, the model is also written as \[ \Phi(B)\mathbf{X}_t = \mathbf{Z}_t, \] where \[ \Phi(z) = I_d - \Phi_1 z - \ldots - \Phi_p z^p \] is the so-called characteristic (matrix) polynomial.

If the mean is not zero, the model is written as \(\mathbf{X}_t = \nu+ \Phi_1 \mathbf{X}_{t-1} + \ldots + \Phi_p \mathbf{X}_{t-p} + \mathbf{Z}_t\) or \(\mathbf{X}_t - \mu = \Phi_1 (\mathbf{X}_{t-1} -\mu) + \ldots + \Phi_p (\mathbf{X}_{t-p} -\mu) + \mathbf{Z}_t\) .

Some known facts about VARs:

Stability: It is known that the condition \(\mbox{det}(\Phi(z)) = \mbox{det}(I_d - \Phi_1z - \ldots - \Phi_p z^p) \neq 0\), for \(|z|\leq 1\), ensures a unique causal stationary solution to the VAR equation.
ACVFs: Except some special cases, the VAR models do not have explicit formulae for the ACVF. This is easy to understand through VAR(1) model.
Spectra: The spectral density, on the other hand, can be expressed as \[ f_{\mathbf X}(w) = \frac{1}{2\pi}\Big( \Big( I_d - \Phi_1 e^{-iw} - \ldots - \Phi_p e^{-iwp} \Big)^*\Big)^{-1} \Sigma_{\mathbf Z} \Big( I_d - \Phi_1 e^{iw} - \ldots - \Phi_p e^{iwp} \Big)^{-1} \] and follows from analogous results for spectral densities under linear filtering.
VARMA models: As in the univariate case, more general VARMA\((p,q)\) can be defined as satisfying the equation \[ \mathbf{X}_t = A_1 \mathbf{X}_{t-1} + \ldots + A_p \mathbf{X}_{t-p} + \mathbf{Z}_t + \Theta_1 \mathbf{Z}_{t-1} +\ldots + \Theta_q \mathbf{Z}_{t-q} \] but the class of these models is no longer identifiable. Example?
VAR\((p)\) models can be “embedded” into VAR\((1)\) models. Analysis of the latter model then carries special significance. Example: stability of the model; response analysis below.
Structural VARs: The dependence/correlation structure of \(\mathbf{Z}_t\) is not necessarily diagonal in order to capture contemporaneous dependence/correlation among the component series of \(\mathbf{X}_t\). More on this and related structural VARs in a few lectures.

Some notes on estimation:

Ordinary least-squares estimation (OLS): \[ (\widehat \Phi_1,\ldots,\widehat \Phi_p) = \mbox{argmin}_{\Phi_1,\ldots,\Phi_p} \sum_{t=p+1}^T \| \mathbf{X}_t - \Phi_1 \mathbf{X}_{t-1} - \ldots - \Phi_p \mathbf{X}_{t-p}\|^2 \] which is equivalent to running regression per VAR equation. The estimate \(\widehat \Sigma_{\mathbf Z}\) is then defined as the sample variance-covariance matrix of the residuals.

Multivariate (Generalized) least squares, Maximum Likelihood Estimation: Surprise: essentially the same as LSE! Why? See Lutkepohl (2005), Section 3.2.1, p. 70-71, who writes: “This result is due to Zellner (1962) who showed that GLS and LS estimation in a multiple equation model are identical if the regressors in all equations are the same.”

The estimators are asymptotically normal under mild assumptions with explicit variance-covariance matrix.

Model selection, model diagnostics, forecasting, etc: The principles are similar to what was done for AR\((p)\) models in the univariate case.

Back to data example

VARselect(usconsumption, lag.max=8, type="const")$selection

## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      5      1      1      5

var <- VAR(usconsumption, p=1, type="const")
summary(var)

## 
## VAR Estimation Results:
## ========================= 
## Endogenous variables: consumption, income 
## Deterministic variables: const 
## Sample size: 163 
## Log Likelihood: -354.304 
## Roots of the characteristic polynomial:
## 0.3845 0.3072
## Call:
## VAR(y = usconsumption, p = 1, type = "const")
## 
## 
## Estimation results for equation consumption: 
## ============================================ 
## consumption = consumption.l1 + income.l1 + const 
## 
##                Estimate Std. Error t value Pr(>|t|)    
## consumption.l1  0.30891    0.08142   3.794  0.00021 ***
## income.l1       0.08267    0.06007   1.376  0.17070    
## const           0.46203    0.07730   5.977 1.43e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.6499 on 160 degrees of freedom
## Multiple R-Squared: 0.1379,  Adjusted R-squared: 0.1272 
## F-statistic:  12.8 on 2 and 160 DF,  p-value: 6.966e-06 
## 
## 
## Estimation results for equation income: 
## ======================================= 
## income = consumption.l1 + income.l1 + const 
## 
##                Estimate Std. Error t value Pr(>|t|)    
## consumption.l1   0.5633     0.1101   5.119 8.73e-07 ***
## income.l1       -0.2316     0.0812  -2.853  0.00491 ** 
## const            0.4841     0.1045   4.632 7.45e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.8785 on 160 degrees of freedom
## Multiple R-Squared: 0.143,   Adjusted R-squared: 0.1323 
## F-statistic: 13.35 on 2 and 160 DF,  p-value: 4.343e-06 
## 
## 
## 
## Covariance matrix of residuals:
##             consumption income
## consumption      0.4224 0.2260
## income           0.2260 0.7718
## 
## Correlation matrix of residuals:
##             consumption income
## consumption      1.0000 0.3959
## income           0.3959 1.0000

The estimated variance-covariance matrix of the parameter estimates is also computed. It is useful e.g. in various formal statistical tests.

vcov(var)

##                            consumption:(Intercept)
## consumption:(Intercept)               0.0059756783
## consumption:consumption.l1           -0.0034383791
## consumption:income.l1                -0.0010690070
## income:(Intercept)                    0.0031977427
## income:consumption.l1                -0.0018399671
## income:income.l1                     -0.0005720538
##                            consumption:consumption.l1
## consumption:(Intercept)                  -0.003438379
## consumption:consumption.l1                0.006628590
## consumption:income.l1                    -0.002116272
## income:(Intercept)                       -0.001839967
## income:consumption.l1                     0.003547133
## income:income.l1                         -0.001132473
##                            consumption:income.l1 income:(Intercept)
## consumption:(Intercept)            -0.0010690070       0.0031977427
## consumption:consumption.l1         -0.0021162724      -0.0018399671
## consumption:income.l1               0.0036084610      -0.0005720538
## income:(Intercept)                 -0.0005720538       0.0109190718
## income:consumption.l1              -0.0011324731      -0.0062827860
## income:income.l1                    0.0019309824      -0.0019533454
##                            income:consumption.l1 income:income.l1
## consumption:(Intercept)             -0.001839967    -0.0005720538
## consumption:consumption.l1           0.003547133    -0.0011324731
## consumption:income.l1               -0.001132473     0.0019309824
## income:(Intercept)                  -0.006282786    -0.0019533454
## income:consumption.l1                0.012112106    -0.0038669637
## income:income.l1                    -0.003866964     0.0065935685

Model diagnostics …

roots(var)

## [1] 0.3844935 0.3072100

resid.var <- resid(var)
plot.ts(resid.var,ylab = 'Residuals',xlab='Year')

acf(resid.var)

pacf(resid.var)

# fit model of order 5 instead?


qqnorm(resid.var[,1], main="")
qqline(resid.var[,1])

qqnorm(resid.var[,2], main="")
qqline(resid.var[,2])

 normality.test(var)

## $JB
## 
##  JB-Test (multivariate)
## 
## data:  Residuals of VAR object var
## Chi-squared = 49.425, df = 4, p-value = 4.761e-10
## 
## 
## $Skewness
## 
##  Skewness only (multivariate)
## 
## data:  Residuals of VAR object var
## Chi-squared = 11.525, df = 2, p-value = 0.003143
## 
## 
## $Kurtosis
## 
##  Kurtosis only (multivariate)
## 
## data:  Residuals of VAR object var
## Chi-squared = 37.9, df = 2, p-value = 5.89e-09

 # rejecting normality

pred_var <- predict(var, n.ahead = 8, ci = 0.95)
pred_var

## $consumption
##           fcst      lower    upper       CI
## [1,] 0.7631103 -0.5106941 2.036915 1.273804
## [2,] 0.7714386 -0.5857709 2.128648 1.357209
## [3,] 0.7588242 -0.6110038 2.128652 1.369828
## [4,] 0.7588330 -0.6127505 2.130417 1.371583
## [5,] 0.7573437 -0.6145095 2.129197 1.371853
## [6,] 0.7572296 -0.6146621 2.129121 1.371892
## [7,] 0.7570449 -0.6148526 2.128942 1.371898
## [8,] 0.7570172 -0.6148812 2.128916 1.371898
## 
## $income
##           fcst      lower    upper       CI
## [1,] 0.8912373 -0.8306391 2.613114 1.721876
## [2,] 0.7075199 -1.1397109 2.554751 1.847231
## [3,] 0.7547652 -1.1033771 2.612908 1.858142
## [4,] 0.7367158 -1.1232256 2.596657 1.859941
## [5,] 0.7409015 -1.1192241 2.601027 1.860126
## [6,] 0.7390930 -1.1210639 2.599250 1.860157
## [7,] 0.7394476 -1.1207129 2.599608 1.860161
## [8,] 0.7392614 -1.1208997 2.599422 1.860161

par(mar=c(3,3,1,1))
plot(pred_var)

Question: Is one doing better than just using univariate ARs?

  # time series length for out-of-sample forecasting
  T0 <- 10
  # length of training data set
  T <- dim(usconsumption)[1] - T0

  ts <- usconsumption[1:T,]
  ts1 <- ts[,1]
  ts2 <- ts[,2]
  
  # univariate prediction
  
  auto.arima(ts1,max.p=5,max.q=5,max.P = 0,max.Q=0,ic="aic",allowmean = TRUE)

## Series: ts1 
## ARIMA(1,0,1) with non-zero mean 
## 
## Coefficients:
##          ar1      ma1  intercept
##       0.7334  -0.4893     0.7901
## s.e.  0.1092   0.1298     0.0954
## 
## sigma^2 estimated as 0.3969:  log likelihood=-145.92
## AIC=299.84   AICc=300.11   BIC=311.99

  auto.arima(ts2,max.p=5,max.q=5,max.P = 0,max.Q=0,ic="aic",allowmean = TRUE)

## Series: ts2 
## ARIMA(0,0,0) with non-zero mean 
## 
## Coefficients:
##       intercept
##          0.7931
## s.e.     0.0724
## 
## sigma^2 estimated as 0.8127:  log likelihood=-202.04
## AIC=408.09   AICc=408.16   BIC=414.16

  ts1_ar <- arima(ts1, order=c(1,0,1), include.mean = TRUE, method = "ML")
  ts2_ar <- arima(ts2, order=c(0,0,0), include.mean = TRUE, method = "ML")
  
  ts1.forecast <- predict(ts1_ar, T0)
  ts2.forecast <- predict(ts2_ar, T0)
  
  diff1 <- mean((ts1.forecast$pred[1:T0] - usconsumption[(T+1):(T+T0),1])^2)
  diff2 <- mean((ts2.forecast$pred[1:T0] - usconsumption[(T+1):(T+T0),2])^2)
  
  diff1

## [1] 0.7876984

  diff2

## [1] 2.034029

  diff1+diff2

## [1] 2.821728

  # multivariate prediction
  
  var0 <- VAR(ts, p=1, type="const")
  pred_var0 <- predict(var0, n.ahead = T0, ci = 0.95)
  diff1_m <- mean((pred_var0$fcst$consumption[,1] - usconsumption[(T+1):(T+T0),1])^2)
  diff2_m <- mean((pred_var0$fcst$income[,1] - usconsumption[(T+1):(T+T0),2])^2)
  
  diff1_m

## [1] 1.046991

  diff2_m

## [1] 1.659539

  diff1_m+diff2_m

## [1] 2.706531

Granger causality analysis

As in the data example above, consider the bivariate case \(d=2\) and VAR\((1)\) model, that is, assuming mean \(0\) for simplicity, \[ \left( \begin{array}{c} X_{1t}\\ X_{2t} \end{array} \right) = \left( \begin{array}{cc} \phi_{11} & \phi_{12}\\ \phi_{21} & \phi_{22} \end{array} \right) \left( \begin{array}{c} X_{1,t-1}\\ X_{2,t-1} \end{array} \right) + \left( \begin{array}{c} Z_{1t}\\ Z_{2t} \end{array} \right). \] For example, the case \[ \phi_{12} = 0 \] seems special, and has the following interpretation: \[ \phi_{12} = 0\quad \Leftrightarrow\quad X_{1t}(1|\{\mathbf{X}_s,s\leq t\}) = X_{1t}(1|\{X_{1,s},s\leq t\}), \] where \(X_{1t}(1|\Omega_t)\) is the best linear (one-step ahead) predictor of \(X_{1,t+1}\) given \(\Omega_t\), and also \[ \phi_{12} = 0\quad \Leftrightarrow\quad E\Big(X_{1,t+1} - X_{1t}(1|\{\mathbf{X}_s,s\leq t\}) \Big)^2 = E\Big(X_{1,t+1} - X_{1t}(1|\{X_{1,s},s\leq t\}) \Big)^2. \] and also \[ \phi_{12} = 0\quad \Leftrightarrow\quad X_{1t}(h|\{\mathbf{X}_s,s\leq t\}) = X_{1t}(h|\{X_{1,s},s\leq t\}),\ h=1,2,\ldots \]

In these equivalent cases when \(\phi_{12}=0\), one says that \(X_{2t}\) does not Granger-cause \(X_{1t}\). And vice versa, if \(\phi_{12}\neq 0\), one says that \(X_{2t}\) Granger-causes \(X_{1t}\), in which case \[ \phi_{12} \neq 0\quad \Leftrightarrow\quad E\Big(X_{1,t+1} - X_{1t}(1|\{\mathbf{X}_s,s\leq t\}) \Big)^2 < E\Big(X_{1,t+1} - X_{1t}(1|\{X_{1s},s\leq t\}) \Big)^2. \] One can then obviously test for Granger causality by testing whether \(\phi_{12}=0\).

Note that the interpretation of Granger causality is consistent with the data example above. The roles of \(X_{1t}\) and \(X_{2t}\) can obviously be reversed.

This notion of Granger causality can be extended to subvector series of a general multidimensional vector series \(\mathbf{X}_t\). See Lutkepohl (2005), Section 2.3.1 and Section 3.6. In the general case, Granger causality is still about testing some of the coefficients to be zero (e.g. using a Wald-type test).

Note: The above is a “practical” definition. More generally, not necessarily linear MSE is used; more general information is used for conditioning; etc.

var <- VAR(usconsumption, p=1, type="const")

var_causality_2 <- causality(var,cause="income",boot=TRUE)
var_causality_2$Granger

## 
##  Granger causality H0: income do not Granger-cause consumption
## 
## data:  VAR object var
## F-Test = 1.8938, boot.runs = 100, p-value = 0.19

var_causality_1 <- causality(var,cause="consumption",boot=TRUE)
var_causality_1$Granger

## 
##  Granger causality H0: consumption do not Granger-cause income
## 
## data:  VAR object var
## F-Test = 26.201, boot.runs = 100, p-value < 2.2e-16

There is also a related notion of instant Granger causality. In the special bivariate case above, it is said that there is NO instantaneous causality between \(X_{12}\) and \(X_{2t}\) if \[ X_{1t}(1|\{\mathbf{X}_s,s\leq t\},\{X_{2,t+1}\}) = X_{1t}(1|\{\mathbf{X}_s,s\leq t\}), \] which can be shown to be equivalent to \[ E(Z_{1t} Z_{2t}) = 0. \] This notion of instantaneous Granger causality can be extended to subvector series of a general multidimensional vector series \(\mathbf{X}_t\). See Lutkepohl (2005), Section 2.3.1 and Section 3.6. In the general case, Granger causality is still about testing some relations in \(\Sigma_{\mathbf{Z}}\).

var <- VAR(usconsumption, p=1, type="const")
var_causality_1 <- causality(var,cause="consumption",boot=TRUE)
var_causality_1$Instant

## 
##  H0: No instantaneous causality between: consumption and income
## 
## data:  VAR object var
## Chi-squared = 22.084, df = 1, p-value = 2.61e-06

var_causality_2 <- causality(var,cause="income",boot=TRUE)
var_causality_2$Instant

## 
##  H0: No instantaneous causality between: income and consumption
## 
## data:  VAR object var
## Chi-squared = 22.084, df = 1, p-value = 2.61e-06

Note: Some interesting words of caution about Granger causality can be found in Lutkepohl (2005), pp. 48-51. Punchline: “The previous critical remarks are meant to caution the reader and multiple time series analyst against overinterpreting the evidence from a VAR model. Still, causality analyses are useful tools in practice if these critical points are kept in mind. At the very least, a Granger-causality analysis tells the analyst whether a set of variables contains useful information for improving the predictions of another set of variables.”

Impulse response analysis

Consider the bivariate VAR\((1)\) model above \[ \mathbf{X}_t = \left( \begin{array}{c} X_{1t}\\ X_{2t} \end{array} \right) = \left( \begin{array}{cc} \phi_{11} & \phi_{12}\\ \phi_{21} & \phi_{22} \end{array} \right) \left( \begin{array}{c} X_{1,t-1}\\ X_{2,t-1} \end{array} \right) + \left( \begin{array}{c} Z_{1t}\\ Z_{2t} \end{array} \right) = \Phi \left( \begin{array}{c} X_{1,t-1}\\ X_{2,t-1} \end{array} \right) + \left( \begin{array}{c} Z_{1t}\\ Z_{2t} \end{array} \right). \] Assume \(\mathbf{X}_{-1}=0\). Note that a shock \[ \left( \begin{array}{c} Z_{1,0}\\ Z_{2,0} \end{array} \right) = \left( \begin{array}{c} 1\\ 0 \end{array} \right) \] propagates through the system as \[ \left( \begin{array}{c} X_{1,0}\\ X_{2,0} \end{array} \right) = \left( \begin{array}{c} 1\\ 0 \end{array} \right) \] \[ \left( \begin{array}{c} X_{1,1}\\ X_{2,1} \end{array} \right) = \left( \begin{array}{cc} \phi_{11} & \phi_{12}\\ \phi_{21} & \phi_{22} \end{array} \right) \left( \begin{array}{c} 1\\ 0 \end{array} \right) = \left( \begin{array}{c} \phi_{11} \\ \phi_{21} \end{array} \right), \] \[ \left( \begin{array}{c} X_{1,2}\\ X_{2,2} \end{array} \right) = \left( \begin{array}{cc} \phi_{11} & \phi_{12}\\ \phi_{21} & \phi_{22} \end{array} \right) \left( \begin{array}{c} \phi_{11} \\ \phi_{21} \end{array} \right) = \left( \begin{array}{c} \phi_{11}^2 + \phi_{12}\phi_{21} \\ \phi_{21}\phi_{11} + \phi_{22}\phi_{21} \end{array} \right), \] etc. In fact, one can show that \[ \left( \begin{array}{c} X_{1,n}\\ X_{2,n} \end{array} \right) = \mbox{the first column of}\ \left( \begin{array}{cc} \phi_{11} & \phi_{12}\\ \phi_{21} & \phi_{22} \end{array} \right)^n = \mbox{the first column of}\ \Phi^n. \] Similarly, a shock \[ \left( \begin{array}{c} Z_{1,0}\\ Z_{2,0} \end{array} \right) = \left( \begin{array}{c} 0\\ 1 \end{array} \right) \] propagates through the system as \[ \left( \begin{array}{c} X_{1,n}\\ X_{2,n} \end{array} \right) = \mbox{the second column of}\ \left( \begin{array}{cc} \phi_{11} & \phi_{12}\\ \phi_{21} & \phi_{22} \end{array} \right)^n = \mbox{the second column of}\ \Phi^n. \] The analysis of these true/estimated columns is referred to as impulse response analysis.

Note that these columns also appear in the MA\((\infty)\) representation of VAR\((1)\) model \[ \mathbf{X}_t = (I - \Phi B)^{-1} \mathbf{Z}_t = (I + \Phi B + \Phi^2 B + \Phi B^3 + \ldots ) \mathbf{Z}_t = \mathbf{Z}_t +\Phi \mathbf{Z}_{t-1} + \Phi^2 \mathbf{Z}_{t-2} + \ldots \]

This is also the case for more general VAR\((p)\) models. The impulse response coefficients appear as columns in the matrices \(\Theta_i\) of the MA\((\infty)\) expansion \[ \mathbf{X}_t = \mathbf{Z}_t +\Theta_1 \mathbf{Z}_{t-1} + \Theta_2 \mathbf{Z}_{t-2} + \ldots. \]

Question: What is an orthogonal impulse response analysis?

Note: The impulse response coefficients can be used conveniently to express forecast error.

var <- VAR(usconsumption, p=1, type="const")
var_irf <- irf(var,n.ahead = 10,ortho = FALSE,boot = TRUE, ci = 0.95,cumulative = FALSE)
# ortho = TRUE: "renormalizes" the shocks so that Sigma = I
# cumulative = TRUE: if want cumulative effect of the shocks

par(mfrow = c(2,1),mar=c(2,2,2,2))
plot(var_irf)

var <- VAR(usconsumption, p=1, type="const")
var_irf <- irf(var,n.ahead = 10,ortho = TRUE,boot = TRUE, ci = 0.95,cumulative = FALSE)
# ortho = TRUE: "renormalizes" the shocks so that Sigma = I
# cumulative = TRUE: if want cumulative effect of the shocks

par(mfrow = c(2,1),mar=c(2,2,2,2))
plot(var_irf)

Note: Some interesting words of critique about impulse response analysis can be found in Lutkepohl (2005), pp. 62-63.

Some final notes

Many packages (e.g. vars used above) also allow to include in VAR models other exogenous variables. The most common exogenous variables are deterministic trends. “Stochastic” trends will be discussed in a few lectures.
Several books on time series analysis have introductory material on multivariate time series, e.g. Brockwell and Davis (2006). A few other books focus specifically on such time series, most notably Lutkepohl (2005) (but without spectral analysis), as referenced on many occasions above. See also Hannan (1970), Reinsel (1997).

Questions/Homework

Q1: Analyze real multivariate time series using the methods described in “Multivariate Lectures”.

Q2: Prove some of the facts stated about the spectral coherence above.

Q3: Show that the spectral density matrix \(f_X(w)\) is non-negative definite.

Q4: Discuss and characterize the more general forms of Granger causality and impulse response analyses.

Q5: Show that OLS and GLS/ML are essentially equivalent for VAR estimation.

References

Brockwell, P. J., and R. A. Davis. 2006. Time Series: Theory and Methods. Springer Series in Statistics. Springer, New York.

Hannan, E. J. 1970. Multiple Time Series. John Wiley; Sons, Inc., New York-London-Sydney.

Lütkepohl, H. 2005. New Introduction to Multiple Time Series Analysis. Springer-Verlag, Berlin.

Reinsel, G. C. 1997. Elements of Multivariate Time Series Analysis. Second. Springer Series in Statistics. Springer-Verlag, New York.

Zellner, A. 1962. “An Efficient Method of Estimating Seemingly Unrelated Regressions and Tests for Aggregation Bias.” Journal of the American Statistical Association 57: 348–68.