Multivariate IV: Graphical Models and Structural VARs

What is this all about?

This is about building (sparse) VAR like models that are “consistent” with (sparse) partial correlations among the components of multivariate time series data. Source: I will follow Chapter 5 in the book by Wilson et al. (2015). The focus is on lower-dimensional VARs.

Partial correlations

Definition 4.1: Given a set of (zero mean) random variables \(X_1,X_2,\ldots,X_n\), the partial correlation \(\pi(X_i,X_j)\) between two random variables \(X_i\) and \(X_j\) included in \(\mathbf{X} = (X_1,X_2,\ldots,X_n)'\) is defined as correlation between \(X_i\) and \(X_j\) conditioned on all the other variables in \(X\), that is, \[ \pi(X_i,X_j) = \rho(X_i,X_j\ |\ \mathbf{X}\setminus \{X_i,X_j\}). \]

It is known that (\(i\neq j\)) \[ \pi(X_i,X_j) = - \frac{W_{ij}}{\sqrt{W_{ii}W_{jj}}}, \] where \(W = (W_{kl})\) is the inverse, that is, \(W = V^{-1}\), of the covariance matrix \(V = E \mathbf{X}\mathbf{X}'\) with \(\mathbf{X} = (X_1,X_2,\ldots,X_n)'\). Name: \(W\) is known as a precision matrix.

Partial correlations are often represented in a graph, the so-called partial correlation graph (PCG), where an edge corresponds to a non-zero partial correlation. In the Gaussian case, PCG is also CIG (Conditional Independence Graph). For example, for \(n=4\) variables, PCG could be as follows.

library("igraph")

g <- graph_from_literal( X1-X2-X3-X4, X1-X4, X2-X4, X4-X5 )
plot.igraph(g, vertex.color="white", edge.curved=0, edge.width = 2, vertex.size=30)

For small dimension \(n\), given a sample of iid observations \(\mathbf{X}_1,\ldots,\mathbf{X}_T\), the partial correlations and PCG are estimated as:

The sample covariance matrix \(\widehat V\) is computed and the sample inverse covariance matrix \(\widehat W = (\widehat W_{ij})\) is computed as \(\widehat W = \widehat V^{-1}\). The sample partial correlations (\(i\neq j\)) are \[ \widehat \pi(X_i,X_j) = - \frac{\widehat W_{ij}}{\sqrt{\widehat W_{ii}\widehat W_{jj}}}. \]
The null hypothesis that \(\pi(X_i,X_j)=0\) is rejected at level \(\alpha\) if \[ |\widehat \pi(X_i,X_j)| > \frac{t_{\alpha/2,T-n+1}}{\sqrt{t^2_{\alpha/2,T-n+1}+(T-n+1)}}, \] where \(t_{\alpha/2,T-n+1}\) is the corresponding critical value of the \(t_{T-n+1}\) distribution.

The rational for the last step comes from linear regression - see e.g. p. 136 in Wilson et al. (2015).

Since multiple testing is performed, a practical recommendation is to perform this for several values of \(\alpha\).

Note: For larger dimension \(n\), the so-called graphical lasso is used: one estimates the precision matrix \(W=V^{-1}\) through \[ \widehat W = \mathop{\rm argmax}_{W} \Big( \log \mbox{det}(W) - \mbox{tr}(\widehat V W) - \lambda \|W\|_1 \Big), \] where \(\lambda\) is a penalty parameter. See Friedman et al. (2008).

Data example: Flower prices

The monthly flour price indices in Buffalo, Minneapolis and Kansas city from August 1972 to November 1980.

prices <- read.table("Flourprice.txt")
prices <- ts(prices, start=c(1972,8), end=c(1980,11), frequency=12, names = c("Buff","Minn","Kans"))

matplot(prices, type = "l") 
legend('top', c('Buff','Minn','Kans'),lty=c(1,2,3))

library(vars)

VARselect(prices, lag.max=8, type="const")$selection

## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      2      2      2      2

var <- VAR(prices, p=2, type="const")

summary(var)

## 
## VAR Estimation Results:
## ========================= 
## Endogenous variables: Buff, Minn, Kans 
## Deterministic variables: const 
## Sample size: 98 
## Log Likelihood: -781.229 
## Roots of the characteristic polynomial:
## 0.9858 0.9316 0.9316 0.4811 0.1671 0.1671
## Call:
## VAR(y = prices, p = 2, type = "const")
## 
## 
## Estimation results for equation Buff: 
## ===================================== 
## Buff = Buff.l1 + Minn.l1 + Kans.l1 + Buff.l2 + Minn.l2 + Kans.l2 + const 
## 
##         Estimate Std. Error t value Pr(>|t|)    
## Buff.l1 -0.29952    0.35686  -0.839 0.403484    
## Minn.l1  1.37412    0.40465   3.396 0.001016 ** 
## Kans.l1 -0.01335    0.19401  -0.069 0.945287    
## Buff.l2  1.39134    0.36558   3.806 0.000256 ***
## Minn.l2 -1.75052    0.41235  -4.245 5.25e-05 ***
## Kans.l2  0.24441    0.19837   1.232 0.221079    
## const    7.53267    4.57862   1.645 0.103382    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 6.828 on 91 degrees of freedom
## Multiple R-Squared: 0.938,   Adjusted R-squared: 0.934 
## F-statistic: 229.6 on 6 and 91 DF,  p-value: < 2.2e-16 
## 
## 
## Estimation results for equation Minn: 
## ===================================== 
## Minn = Buff.l1 + Minn.l1 + Kans.l1 + Buff.l2 + Minn.l2 + Kans.l2 + const 
## 
##         Estimate Std. Error t value Pr(>|t|)    
## Buff.l1 -0.84455    0.38154  -2.214  0.02936 *  
## Minn.l1  1.93768    0.43264   4.479 2.18e-05 ***
## Kans.l1 -0.01731    0.20743  -0.083  0.93369    
## Buff.l2  0.97897    0.39086   2.505  0.01404 *  
## Minn.l2 -1.40460    0.44088  -3.186  0.00198 ** 
## Kans.l2  0.28931    0.21209   1.364  0.17591    
## const    8.08874    4.89532   1.652  0.10191    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 7.301 on 91 degrees of freedom
## Multiple R-Squared: 0.937,   Adjusted R-squared: 0.9328 
## F-statistic: 225.4 on 6 and 91 DF,  p-value: < 2.2e-16 
## 
## 
## Estimation results for equation Kans: 
## ===================================== 
## Kans = Buff.l1 + Minn.l1 + Kans.l1 + Buff.l2 + Minn.l2 + Kans.l2 + const 
## 
##         Estimate Std. Error t value Pr(>|t|)    
## Buff.l1  -0.5512     0.4243  -1.299 0.197215    
## Minn.l1   0.8799     0.4812   1.829 0.070724 .  
## Kans.l1   0.8295     0.2307   3.596 0.000526 ***
## Buff.l2   0.7152     0.4347   1.645 0.103381    
## Minn.l2  -1.2951     0.4903  -2.641 0.009724 ** 
## Kans.l2   0.3461     0.2359   1.467 0.145799    
## const    10.7319     5.4445   1.971 0.051747 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 8.12 on 91 degrees of freedom
## Multiple R-Squared: 0.9354,  Adjusted R-squared: 0.9312 
## F-statistic: 219.8 on 6 and 91 DF,  p-value: < 2.2e-16 
## 
## 
## 
## Covariance matrix of residuals:
##       Buff  Minn  Kans
## Buff 46.63 48.17 48.24
## Minn 48.17 53.30 53.21
## Kans 48.24 53.21 65.93
## 
## Correlation matrix of residuals:
##        Buff   Minn   Kans
## Buff 1.0000 0.9664 0.8700
## Minn 0.9664 1.0000 0.8976
## Kans 0.8700 0.8976 1.0000

V <- matrix(c(1.0000, 0.9664, 0.8700, 0.9664, 1.0000, 0.8976, 0.8700, 0.8976, 1.0000), ncol = 3)
W <- solve(V)

Pi <- -diag(diag(W)^(-1/2)) %*% W  %*% diag(diag(W)^(-1/2))
diag(Pi) <- rep(1,3)
Pi

##            [,1]      [,2]       [,3]
## [1,] 1.00000000 0.8534361 0.02258778
## [2,] 0.85343613 1.0000000 0.44843024
## [3,] 0.02258778 0.4484302 1.00000000

# critical value based on normal

message("critical value")
1.96/sqrt(1.96^2 + (98-3+1))

## [1] 0.1961554

The corresponding PCG (CIG):

library("igraph")

g <- graph_from_literal( e1t-e2t, e2t-e3t )
l <- matrix(c(0, 0, 0, 3, 2, 1), ncol=2)
plot.igraph(g, layout = l, edge.curved=c(0,0,0), vertex.color="white", edge.width = 2, vertex.size=30)

Partial correlations, regressions and DAGs

Continuing with the preceding example, one can always write \[\begin{eqnarray} e_{1t} & = & a_{1t} \\ e_{2t} & = & \theta_{2,1} e_{1t} + a_{2t} \\ e_{3t} & = & \theta_{3,1} e_{1t} + \theta_{3,2} e_{2t} + a_{3t}, \end{eqnarray}\] where \(a_{1t}\), \(a_{2t}\) and \(a_{3t}\) are “orthogonal” residuals (i.e. \(Ea_{1t}a_{2t}=0\), etc). (Why?) But since there is no edge between \(e_{1t}\) and \(e_{3t}\) in the corresponding PCG, we have \(\theta_{3,1}=0\) and hence \[\begin{eqnarray} e_{1t} & = & a_{1t} \\ e_{2t} & = & \theta_{2,1} e_{1t} + a_{2t} \\ e_{3t} & = & \theta_{3,2} e_{2t} + a_{3t}. \end{eqnarray}\]

This is graphically represented as:

g <- graph_from_literal( e1t-+e2t, e2t-+e3t )
l <- matrix(c(0, 0, 0, 3, 2, 1), ncol=2)
plot.igraph(g, layout = l, edge.curved=c(0,0,0), vertex.color="white", edge.width = 2, vertex.size=30)

Definition 4.2: The diagram is referred to as directed acyclic graph (DAG).

(Why focus on DAGs? See below in connection to structural VARs.)

Note that the PCG above can also be associated to the following regressions and DAG (not the same as above up to permutation): \[\begin{eqnarray} e_{2t} & = & b_{1t} \\ e_{1t} & = & \eta_{2,2} e_{2t} + b_{2t} \\ e_{3t} & = & \eta_{3,2} e_{2t} + b_{3t}. \end{eqnarray}\] But it cannot be associated with \[\begin{eqnarray} e_{1t} & = & c_{1t} \\ e_{2t} & = & c_{2t} \\ e_{3t} & = & \upsilon_{3,1} e_{1t} +\upsilon_{3,2} e_{2t} + c_{3t}. \end{eqnarray}\]

(Why?) (Explain through matrices.)

The corresponding DAGs are:

g1 <- graph_from_literal( e1t+-e2t, e2t-+e3t )
g2 <- graph_from_literal( e1t-+e2t, e2t+-e3t )
l <- matrix(c(0, 0, 0, 3, 2, 1), ncol=2)

par(mfrow = c(1, 2))
plot.igraph(g1, layout = l, edge.curved=c(0,0,0), vertex.color="white", edge.width = 2, vertex.size=30)
plot.igraph(g2, layout = l, edge.curved=c(0,0,0), vertex.color="white", edge.width = 2, vertex.size=30)

Some general rules

From matrix perspective, a directed link in DAG would translate into a link in PCG. So DAG is a “subset” of PCG.

Another possibility in building a DAG is to start with a “saturated” DAG and have some rules by which DAG edges can be removed based on the corresponding PCG. The Markovian properties below are helpful in this regard.

What is/are the DAGs consistent with the following PCG?

library("igraph")

g <- graph_from_literal( e1t-e2t, e2t-e3t, e3t-e4t, e4t-e5t, e5t-e6t, e1t-e4t, e2t-e4t, e3t-e6t, e4t-e6t )
l <- matrix(c(0, 0, 0, 0, 0, 0, 6, 5, 4, 3, 2, 1), ncol=2)
plot.igraph(g, layout = l, edge.curved=c(0,2,0,-2,0,2,0,-2,0), vertex.color="white", edge.width = 2, vertex.size=20, margin = 0.1)

Local Markov property: Each variable, given its neighbors, is conditionally independent of all the remaining variables.

Example: In the PCG above, \(e_5\) is conditionally independent of \(e_1,e_2,e_3\), given its neighbors \(e_4,e_6\).

Global Markov property: If there are no links between any pair of variables, one taken from each of two blocks, then the two blocks are conditionally independent of each other given all the remaining variables.

Example: In the PCG above, the blocks \(\{e_1,e_2\}\) and \(\{e_5,e_6\}\) (without links) are conditionally independent given the rest of the variables \(\{e_3,e_4\}\).

Applying the local/global Markov property to the “saturated”" DAG leads to the following DAGs consistent with the above PCG.

library("igraph")

g <- graph_from_literal( e1t+-e2t, e2t+-e3t, e3t+-e4t, e4t+-e5t, e5t+-e6t, e1t+-e4t, e2t+-e4t, e3t+-e6t, e4t+-e6t )
l <- matrix(c(0, 0, 0, 0, 0, 0, 6, 5, 4, 3, 2, 1), ncol=2)
plot.igraph(g, layout = l, edge.curved=c(0,0,-2,2,0,0,-2,2,0), vertex.color="white", edge.width = 2, vertex.size=20, edge.arrow.size=.5, margin =0.1)

Moralization rule: Given a DAG, to obtain a consistent PCG (CIG):

For each node of the DAG, insert an undirected edge between pairs of its parent nodes, unless they are already linked by an edge.
Replace each directed edge in the DAG by an undirected edge.

For example, the following DAG is also consistent with the PCG above:

library("igraph")

g <- graph_from_literal( e1t+-e2t, e2t+-e3t, e3t+-e4t, e4t+-e5t, e1t+-e4t, e2t+-e4t, e3t+-e6t, e4t+-e6t )
l <- matrix(c(0, 0, 0, 0, 0, 0, 6, 5, 4, 3, 2, 1), ncol=2)
plot.igraph(g, layout = l, edge.curved=c(0,0,-2,2,0,0,-2,2), vertex.color="white", edge.width = 2, vertex.size=20, edge.arrow.size=.5, margin =0.1)

In practice, the selection between different possible DAG explanations of an estimated PCG (CIG) is assisted by comparing the goodness of fit of the estimated models and testing for significance of the coefficients that correspond to links in the competing models. (To be illustrated below.) The so-called product rules for moralized links are also used: e.g. in the DAG and its associated PCG given below, one has \[ \pi(X,Y) = - \pi(X,Z) \times \pi(Y,Z). \]

g1 <- graph_from_literal( X-+Z, Z+-Y )
g2 <- graph_from_literal( X-Z, Z-Y, X-Y )
l <- matrix(c(0, 0, 0, 3, 2, 1), ncol=2)

par(mfrow = c(1, 2))
plot.igraph(g1, layout = l, edge.curved=c(0,0,0), vertex.color="white", edge.width = 2, vertex.size=30)
plot.igraph(g2, layout = l, edge.curved=c(0,2,0), vertex.color="white", edge.width = 2, vertex.size=30)

This product rule can be extended to more sophisticated DAGs that have moralized link in their PCG. See Section 5.8 in Wilson et al. (2015). But for more sophisticated graphs, product rules also become more complex and may not be necessarily easy to keep track of (instead the various models may be fitted and examined).

DAG for flour prices innovations

Based on one of the DAGs suggested for the flour prices innovations, the following DAG model is obtained through OLS regression (with \(t\) values in parentheses):

g <- graph_from_literal( e1t-+e2t, e2t-+e3t )
l <- matrix(c(0, 0, 0, 3, 2, 1), ncol=2)
plot.igraph(g, layout = l, edge.curved=c(0,0,0), vertex.color="white", edge.width = 2, vertex.size=30,edge.label= c("1.033 (37)", "0.998 (20.1)"))

The (approximate) likelihood ratio tests statistic (deviance; based on minus twice the log-likelihood or sample size times log of the sum of the residuals squared) of the DAG regression model against the saturated regression model (including \(e_{1,t}\) in the regression of \(e_{3,t}\)) turns out to be \(0.05\), which after comparing to a critical value of the \(\chi^2(1)=\chi^2_1\) distribution suggests that the reduced DAG regression model is preferred. The same conclusion is reached through model selection criteria (AIC, SIC, etc) - see Wilson et al. (2015), p. 145.

The other DAG regression model consistent with the PCG gives exactly the same deviance and information criteria, and is called likelihood equivalent. The considered DAG regression model NOT consistent with the PCG gives a deviance difference of \(139.6\), suggesting that it is extremely inadequate.

Partial correlations, VAR models and structural VARs

The reason that the VAR error terms (innovations) are correlated, as e.g. in the flour price example above, is that they account for correlation of the components of the series \(\mathbf{X}_t\) for the same \(t\). To have uncorrelated error terms, one could fit instead a (zero mean) structural VAR model (SVAR model, for short) of the form: \[ \Phi_0 \mathbf{X}_t = \Phi_1 \mathbf{X}_{t-1} + \Phi_2 \mathbf{X}_{t-2} + \ldots + \Phi_p \mathbf{X}_{t-p} + \mathbf{a}_t, \] where the components of \(\mathbf{a}_t\) are uncorrelated, with a diagonal matrix \(D = E\mathbf{a}_t \mathbf{a}_t'\).

Note that the above structural VAR model corresponds to the canonical VAR model: \[\begin{eqnarray} \mathbf{X}_t & = & \Phi_0^{-1} \Phi_1 \mathbf{X}_{t-1} + \Phi_0^{-1}\Phi_2 \mathbf{X}_{t-2} + \ldots + \Phi_0^{-1}\Phi_p \mathbf{X}_{t-p} + \Phi_0^{-1}\mathbf{a}_t\\ & = & \Phi_1^* \mathbf{X}_{t-1} + \Phi_2^* \mathbf{X}_{t-2} + \ldots + \Phi_p^* \mathbf{X}_{t-p} + \mathbf{\epsilon}_t, \end{eqnarray}\]

where the components of \(\mathbf{\epsilon}_t\) are possible correlated, with the covariance matrix \(V = E\mathbf{\epsilon}_t \mathbf{\epsilon}_t'\). The connection between \(V\) and \(D\) is expressed as: \[ \Phi_0 V \Phi_0' = D, \] that is, \(\Phi_0\) diagonalizes \(V^{-1}\). In particular, sparse \(V^{-1}\) is associated with sparse \(\Phi_0\).

While a structural VAR corresponds to one canonical VAR, this is not true conversely: the same canonical VAR can be associated with different structural VARs. This is because the matrix \(\Phi\) diagonilizing \(V^{-1}\) is not unique. In fact, restrictions need to be imposed for identifiability of structural VAR, and this is often done by taking a lower-triangular (or upper-triangular) \(\Phi_0\), with \(1\)’s on the diagonal.

Note: Another name for structural VAR is simultaneous equations modeling.

Back to Example of flour prices

We shall fit an SVAR(2) model consistent with partial correlations in the series and its values at lags 1 and 2. The above SVAR model will be thought as an underlying regression equation for DAG. Note that this situation is slightly different from that considered in the general context above: We have only 3 equations, and expect only variables with past lags to have directed links into “future” variables.

X <- cbind(prices[3:100,],prices[2:99,],prices[1:98,])
VX <- cov(X)
WX <- solve(VX)

PiX <- -diag(diag(WX)^(-1/2)) %*% WX  %*% diag(diag(WX)^(-1/2))
diag(PiX) <- rep(1,9)
PiX

##              [,1]        [,2]        [,3]        [,4]       [,5]
##  [1,]  1.00000000  0.85321333  0.02312921  0.45154411 -0.2884547
##  [2,]  0.85321333  1.00000000  0.44829127 -0.49678010  0.5219660
##  [3,]  0.02312921  0.44829127  1.00000000  0.12991046 -0.4017109
##  [4,]  0.45154411 -0.49678010  0.12991046  1.00000000  0.8308836
##  [5,] -0.28845473  0.52196598 -0.40171085  0.83088355  1.0000000
##  [6,] -0.01158511 -0.29944807  0.65761071 -0.11511950  0.5717615
##  [7,]  0.47807639 -0.30062063 -0.13167848  0.47155380 -0.4902154
##  [8,] -0.41159896  0.26388327  0.05436952 -0.41577161  0.5961812
##  [9,] -0.03597379  0.03647912  0.05781995  0.04755999 -0.3546225
##              [,6]        [,7]        [,8]        [,9]
##  [1,] -0.01158511  0.47807639 -0.41159896 -0.03597379
##  [2,] -0.29944807 -0.30062063  0.26388327  0.03647912
##  [3,]  0.65761071 -0.13167848  0.05436952  0.05781995
##  [4,] -0.11511950  0.47155380 -0.41577161  0.04755999
##  [5,]  0.57176150 -0.49021541  0.59618115 -0.35462254
##  [6,]  1.00000000  0.13041190 -0.35595222  0.62381694
##  [7,]  0.13041190  1.00000000  0.89001905 -0.05385604
##  [8,] -0.35595222  0.89001905  1.00000000  0.45925781
##  [9,]  0.62381694 -0.05385604  0.45925781  1.00000000

PiX[,1:3]

##              [,1]        [,2]        [,3]
##  [1,]  1.00000000  0.85321333  0.02312921
##  [2,]  0.85321333  1.00000000  0.44829127
##  [3,]  0.02312921  0.44829127  1.00000000
##  [4,]  0.45154411 -0.49678010  0.12991046
##  [5,] -0.28845473  0.52196598 -0.40171085
##  [6,] -0.01158511 -0.29944807  0.65761071
##  [7,]  0.47807639 -0.30062063 -0.13167848
##  [8,] -0.41159896  0.26388327  0.05436952
##  [9,] -0.03597379  0.03647912  0.05781995

Critical values for \(\alpha=0.1\), \(0.05\) and \(0.01\) are, respectively, \(0.171\), \(0.202\) and \(0.262\). This leads to the following PCG:

gflour <- graph_from_literal( X1t-X2t, X2t-X3t, X1tm1-X1t, X2tm1-X2t, X3tm1-X3t, X1tm1-X2t, X2tm1-X1t, X3tm1-X2t, X2tm1-X3t, X1tm2, X1tm2-X2t, X1tm2-X1t, X2tm2-X1t, X2tm2-X2t, X2tm2, X3tm2)
lflour <- matrix(c(0, 0, 0, -1, -1, -1, -2, -2, -2, 3, 2, 1, 3, 2, 1, 3, 2, 1), ncol=2)
plot.igraph(gflour, layout = lflour, edge.curved=c(0,0,0,-.75,0,0,0,0,0,0,0.75,0,0,0), edge.width = 2, vertex.color="white", vertex.size=50, margin = 0.1, edge.label= c("0.85", "0.45", "-0.29", "0.48","-0.41","0.45","-0.5","0.52","-0.3","-0.3","0.26","-0.4",".66"))

The suggested DAG consistent with this PCG is given below, along with the estimated coefficients in the OLS regression and their \(t\) values. It is motivated by the following considerations:

A link between \(X_{2t}\) and \(X_{3t}\) is likely in the direction \(X_{2t} \to X_{3t}\). (Why?)
The link between \(X_{2t}\) and \(X_{3,t-1}\) is a strong candidate for a moralization link. (Why?)
The choice of the direction between \(X_{1t}\) and \(X_{2t}\): in the model where the direction is chosen as \(X_{1t}\to X_{2t}\), the coefficient of the link \(X_{1,t-1}\to X_{1t}\) in the estimated DAG has a \(t\) value of \(0.95\); therefore, the direction \(X_{1,t-1}\to X_{1t}\) is chosen and the link \(X_{1,t-1}\to X_{1t}\) is excluded, suggesting that the corresponding link in the PCG is due to moralization.

gflour <- graph_from_literal( X1t-+X2t, X2t-+X3t, X1tm1-+X2t, X2tm1-+X2t, X3tm1-+X3t, X2tm1-+X1t, X2tm1-+X3t, X1tm2-+X2t, X1tm2-+X1t, X2tm2-+X1t, X2tm2-+X2t, X3tm2)
lflour <- matrix(c(0, 0, 0, -1, -1, -1, -2, -2, -2, 3, 2, 1, 3, 2, 1, 3, 2, 1), ncol=2)
plot.igraph(gflour, layout = lflour, edge.curved=c(0,0,0,0,0,0,0,0.75,0,0,-.75), edge.width = 2, vertex.color="white", vertex.size=50, margin = 0.1, edge.label= c("1.04 (37.5)", "1.00 (20.6)", "-0.54 (-5.8)","1.15 (12.8)","0.52 (5.0)","-0.92 (-15..5)","0.91 (18.9)","1.09 (5.9)","-0.47 (4.4)","-1.28 (-6.5)","0.45 (4.2)"))

The conclusion is that \(X_{3t}\) depends in a simple manner on only \(X_{2t}\) and \(X_{3,t-1}\). The series \(X_{1t}\) and \(X_{2t}\) are NOT dependent on \(X_{3t}\) at all, and their dependence is qualitatively symmetric.

The DAG corresponds to the following SVAR model: \[ \Phi_0 = \left( \begin{matrix} 1 & 0 & 0 \\ -1.04 & 1 & 0\\ 0 & -1.01 & 1 \end{matrix} \right),\quad \Phi_1 = \left( \begin{matrix} 0 & 1.16 & 0 \\ -0.54 & 0.52 & 0\\ 0 & -0.92 & 0.89 \end{matrix}\right),\quad \Phi_2 = \left( \begin{matrix} 1.14 & -1.13 & 0 \\ -0.45 & 0.43 & 0\\ 0 & 0 & 0 \end{matrix} \right) \] Against the saturated DAG model, the deviance is \(15.56\), which should be compared to a critical value of \(\chi^2(10)\) (e.g. \(18.307\) at 5%). Thus, the sparse DAG model is preferred. The same conclusion is reached through model selection criteria.

The obtained model also lends itself to easier interpretation. Essentially (approximately), we have: \[ X_{3t} = X_{2t} + N_t,\quad N_t = 0.91 N_{t-1} + A_t, \] \[ X_{1t} = X_{1,t-2} + (X_{2,t-1} - X_{2,t-2}) + A_{1t}, \] \[ X_{2t} = X_{1t} + 0.5 (X_{2,t-1} + X_{2,t-2}) - 0.5 (X_{1,t-1} + X_{1,t-2}) + A_{2t}. \] Further model diagnostics is carried out in Wilson et al. (2015), pp. 151-152.

Summary

The following steps are suggested to build an SVAR (Wilson et al. (2015), pp. 152-153):

Use the AIC to determine the order of a saturated VAR model which well represents the series.
Construct the sample PCG of the estimated innovation series derived from this VAR model.
Use this to explore possible DAG representations of the innovation series.
Estimate these DAG models and check their likelihood criteria and residual cross-correlations.
Construct the sample PCG of the series and its values lagged up to the determined order.
Use this to explore possible DAG representations of an SVAR model for the series, basing the selected dependencies between current variables on those explored in the innovation series.
Estimate these DAG models and check their likelihood criteria and lagged residual auto- and cross-correlations. If necessary, include or exclude terms from the model to achieve an adequate sparse model.
Check that the PCG implied by the model reasonably well matches the sample PCG; apply other comparisons of the model and sample spectral properties and examine an out-of-sample set of forecasts from the model to assess its consistency in this respect.

Some final notes

An SVAR model can be and is often fitted without reference to PCG/DAG.
Structural VECM can also be defined/estimated. E.g. Lutkepohl (2006).
Other types of graphical modeling for multivariate time series are also available, e.g. NETS by Barigozzi and Brownless (2016).
What are available extensions of the above approach to higher dimension?
Spectral domain perspective: A zero “partial correlation” between two components of multivariate stationary series can be shown to be equivalent to the inverse of the spectral density matrix having the respective component equal to zero at all frequencies.

Questions/Homework

Q0: Supplement your analysis of multivariate time series with graphical modeling if seems appropriate.

Q1:** Why does it make sense to fit SVARs according to non-zero partial correlations?

Q2: Compare the model fitted to the data in this lecture to the models fitted through (A-)LASSO or other variable selection methods.

Q3: Report on generalizations of the product rule for moralized links mentioned above.

Q4: Prove the spectral domain connection of zero partial correlations mentioned above.

References

Barigozzi, M., and C. T. Brownlees. 2016. “NETS: Network Estimation for Time Series.” Preprint.

Friedman, J., T. Hastie, and R. Tibshirani. 2008. “Sparse Inverse Covariance Estimation with the Graphical Lasso.” Biostatistics 9 (3): 432–41.

Lütkepohl, H. 2006. “Structural Vector Autoregressive Analysis for Cointegrated Variables.” In Modern Econometric Analysis: Surveys on Recent Developments, edited by O. Hübler and J. Frohn, 73–86. Berlin, Heidelberg: Springer Berlin Heidelberg.

Wilson, G.T., M. Reale, and J. Haywood. 2015. Models for Dependent Time Series. Chapman & Hall/Crc Monographs on Statistics & Applied Probability. CRC Press.