Multivariate V: Factor models

What is this all about?

It is often convenient/necessary to think of time series as driven by several underlying latent series. For example, for high-dimensional multivariate series, these latent, lower-dimensional series could be the “factors” that are loaded onto the high-dimensional multivariate series and that explain much of the variability in the original series. Another example concerns the so-called blind source separation.

Motivating data examples

Multiple macroeconomic quarterly indicators

A multivariate (high-dimensional) time series derived from multiple macroeconomic quarterly indicators (e.g. GDP, GNP, industrial product, etc.). 144 component series, spanning 190 quarters. Source: Stock and Watson (2008).

load("usquarter.Rdata")
Ye <- Y

q <- dim(Ye)[1]
q

## [1] 144

T <- dim(Ye)[2]
T

## [1] 190

nf <- 12 ## Number of component series to plot 
par(mfrow=c(nf/3,3)) 
par(mar=c(2,2,2,2))
for (l in (1:nf)) {
  plot.ts(Ye[l,])
}

Note that several of the component series look visually similar, and could potentially be driven by one underlying “factor” series.

Audio data

A time series of audio data from two sensors (courtesy of P. Hoff, Duke):

load("audio_data") 

Ya <- Y
dim(Ya)

## [1] 80000     2

par(mfrow=c(1,2)) 
plot(Ya[1000:1500,1],type="l")
plot(Ya[1000:1500,2],type="l")

library(audio) 

#play(.1*Ya[,1]/sd(Ya[,1]) ,rate=8000)
#play(.1*Ya[,2]/sd(Ya[,2]) ,rate=8000)

How does one filter out the two underlying audio signals? (The problem is known as blind source separation or “cocktail party” in signal processing.)

Data/Theory goals of the lecture

The goal of the lecture is to discuss and illustrate some basic techniques related to the questions raised above: dynamic factor models (DFMs) and independent component analysis (ICA).

Dynamic factor modeling (DFM)

Suppose a \(q\)-vector time series \(\mathbf{Y}_t\), \(t=1,\ldots, T\), is stationary-like. Suppose the component series have been standardized to have mean 0 and variance 1. We wish to define a factor model for it.

Definition 5.1: A \(q\)-vector time series \(\{\mathbf{Y}_t\}\) follows a dynamic factor model (DFM, for short) if \[ \mathbf{Y}_t = \Lambda \mathbf{F}_t + \varepsilon_t, \] where \(\Lambda\) is a \(q\times r\) loading matrix with \(r<<q\), \(\{\mathbf{F}_t\}\) is a \(r\)-vector time series whose components are \(r\) factors, and \(\{\varepsilon_t\}\) is \(\mbox{WN}(0,\Sigma_\varepsilon)\) series, uncorrelated with \(\{\mathbf{F}_t\}\). Moreover, it is often assumed that the factors \(\{\mathbf{F}_t\}\) follow a VAR\((p)\) model \[ \Phi(B) \mathbf{F}_t = \mathbf{Z}_t, \] where \(\Phi(B) = I - \Phi_1 B - \ldots - \Phi_p B^p\) with \(r\times r\) matrices \(\Phi_i\) and \(\{\mathbf{Z}_t\}\sim \mbox{WN}(0,\Sigma_{\mathbf Z})\).

Notes

The above DFM is in the so-called static form. The so-called dynamic form allows \(\Lambda\) to be a matrix polynomial \(\Lambda(B)\) in the backshift operator \(B\).
The loading matrix \(\Lambda\) and the VAR model \(\Phi(B)\) are identifiable only up to a rotation. The estimation method given below estimates these up to a particular rotation.
DFM has a convenient interpretation through the columns of \(\Lambda\) acting as loadings of factors for the original \(q\) component series.
DFM can be considered in the case \(q>T\). In fact, the commonly considered asymptotics is as \(q,T\to\infty\).

Estimation for a given number of factors

For a fixed number of factors \(r\), the DFM parameters can be estimated through the following steps:

Diagonalize the sample covariance matrix \(\widehat \Sigma_{\mathbf Y}\) of \(\mathbf{Y}_t\), \(t=1,\ldots,T\), as \[ \widehat \Sigma_{\mathbf Y} = \frac{1}{T} \sum_{t=1}^T \mathbf{Y}_t \mathbf{Y}_t' = \widehat U \widehat D \widehat U', \] where \(\widehat D = \mbox{diag}(\widehat d_1,\ldots,\widehat d_q)\) with \(\widehat d_1\geq \ldots \geq \widehat d_q\), and \(\widehat U = (\widehat U_1,\ldots,\widehat U_q)\) is the orthogonal eigenvector matrix such that \(\widehat U \widehat U' = \widehat U' \widehat U = I_q\).
The principle component estimators of \(\mathbf{F}_t\) and \(\Lambda\) are given by \[ \widehat{\mathbf{F}}_t = \frac{1}{q} \widehat \Lambda' \mathbf{Y}_t,\quad \widehat \Lambda = \sqrt{q} (\widehat U_1,\ldots,\widehat U_r), \] where \(r\) is the number of factors used.
Use the OLS to estimate the parameters of VAR\((p)\) model based on \(\widehat{\mathbf{F}}_t\).

If needed, the covariance matrices of the error terms can naturally be estimated through sample covariance matrices of the residuals.

Why is this expected to work? See Bai and Ng (2008), Doz et al. (2011), Stock and Watson (2011) for assumptions and proofs. Note that the assumptions are such that \(q,T\to\infty\).

Notes

The principle component estimators in Step 2 are the solutions \(\mathbf{F}_t\) and \(\Lambda\) minimizing the sum of squares \[ \mbox{SSE}(r) = \frac{1}{qT} \sum_{t=1}^T (\mathbf{Y}_t - \Lambda \mathbf{F}_t)'(\mathbf{Y}_t - \Lambda \mathbf{F}_t), \] subject to the normalization \(q^{-1}\Lambda'\Lambda = I_r\). As with the usual principal component analysis (PCA), the factors are thus (orthogonal) linear combinations of the data vectors that explain most of the variability in a suitable sense. (This is one of possible criterions for choosing factors. Another one is discussed below.)
Sometimes, another step is added to the algorithm above corresponding to the Kalman smoother estimation of the factors. See e.g. Doz et al. (2011), though the improvements seem minimal from practical perspective.
The stationary factors \(\mathbf{F}_t\) and loadings \(\Lambda\) are estimated consistently without reference to VAR. Only these estimates will be shown in examples below.
DFMs for nonstationary data are available as well.

Estimation of the number of factors

One popular approach is based on the information criteria (e.g. Bai and Ng (2007)(2008)), with the number of factors estimated by \[ \widehat r = \mathop{\mbox{argmin}}_r \Big\{ \log\Big(\mbox{SSE}(r)\Big) + r g(q,T)\Big\}, \] where \(g(q,T)\) is one of the following three functions, \[ g_1(q,T) = \frac{q+T}{qT} \log\Big( \frac{q+T}{qT} \Big),\quad g_2(q,T) = \frac{q+T}{qT} \log( q\wedge T),\quad g_2(q,T) = \frac{\log( q\wedge T)}{q\wedge T} \] with \(q\wedge T = \min(q,T)\).

Other approaches include ad-hoc eigenvalue scree plots, statistical tests based on random matrix theory (e.g. Onatski (2009)), etc.

Forecasting with DFM

DFM are particularly liked by practitioners because they can be used easily and seem to do well in forecasting. Given a time series \(\mathbf{Y}_t\), \(t=1,\ldots,T\), and its DFM with estimated factors \(\widehat{\mathbf F}_t\) and loading matrix \(\widehat \Lambda\), the \(h\)-step-ahead forecast \(\widehat{\mathbf{Y}}_T(h)\) is defined as \[ \widehat{\mathbf{Y}}_T(h) = \widehat \Lambda \widehat{\mathbf F}_T(h), \] where \(\widehat{\mathbf F}_T(h)\) is the \(h\)-step-ahead forecast of \(\widehat{\mathbf F}_{T+h}\) based on the VAR\((p)\) model fitted to the factors \(\widehat{\mathbf F}_t\).

Examples of DFM

Multiple macroeconomic quarterly indicators

library(vars)
source("DFM-VAR-library.R")

ICselect(Ye, maxr=5)

## $err
## [1] 0.7576303 0.6797813 0.6333382 0.5912211 0.5569279
## 
## $psum1
## [1] 0.8114133 0.7873473 0.7946872 0.8063531 0.8258429
## 
## $psum2
## [1] 0.8182998 0.8011203 0.8153467 0.8338991 0.8602755
## 
## $psum3
## [1] 0.7921429 0.7488065 0.7368760 0.7292715 0.7294909
## 
## $order
## [1] 2 2 4

out <- DFM.VAR(Ye, r=2)


## Factors 
#out$Fh
plot.ts(t(out$Fh))

## Loadings are
#out$Lam
plot.ts(out$Lam)

State flu data set

stateflu <- read.csv("states_abbrev.csv", sep=",", header=TRUE)
stateflu <- as.matrix(stateflu[,c(-1)])
stateflu <- log(na.omit(stateflu))
statefluR <- stateflu[,c("CT","ME","MA","NH","RI","VT","NJ","NY","DE","DC","MD","PA","VA","WV","AL","FL","GA","KY","MS","NC","SC","TN", "IL","IN","MI","MN","OH","WI","AR","LA","NM","OK","TX","IA","KS","MO","NE","CO","MT","ND","SD","UT","WY","AZ","CA","HI","NV", "AK","ID","OR","WA")]
statefluR <- statefluR - matrix(1,dim(statefluR)[1],1) %*% colMeans(statefluR)

statefluR <- statefluR %*% diag(diag(var(statefluR))^(-1/2))

ICselect(t(statefluR), maxr=5)

## $err
## [1] 0.09769093 0.06579661 0.05026991 0.03969935 0.03348386
## 
## $psum1
## [1] 0.1849141 0.2402429 0.3119394 0.3885920 0.4695996
## 
## $psum2
## [1] 0.1887778 0.2479704 0.3235306 0.4040469 0.4889183
## 
## $psum3
## [1] 0.1747856 0.2199859 0.2815538 0.3480778 0.4189570
## 
## $order
## [1] 1 1 1

vvflu <- svd(cov(statefluR))
qflu <- dim(t(statefluR))[1]
Lamflu <- vvflu$u[,1]*sqrt(qflu)
Fhflu <- 1/qflu*t(Lamflu) %*% t(statefluR)


## Factors 
#out$Fh
plot.ts(t(Fhflu))

## Loadings are
#out$Lam
plot(Lamflu,type="l")
abline(h = 0, v = c(6,8,14,22,28,33,37,43,47,51), col = "black")

Note: Out-of-sample forecasts for DFM are worse than those based on sparse VAR model.

Audio data

ICselect(t(Ya), maxr=2)

## $err
## [1] 1.188254e-01 2.506199e-32
## 
## $psum1
## [1] 0.4653951 0.6931395
## 
## $psum2
## [1] 0.4654076 0.6931645
## 
## $psum3
## [1] 0.4653990 0.6931472
## 
## $order
## [1] 1 1 1

r <- 1
vv <- svd(cov(Ya))
qa <- dim(t(Ya))[1]
Lam <- vv$u[,1:r]*sqrt(qa)
Fh <- 1/qa*t(Lam)%*%t(Ya)

#play(.1*Fh[1,]/sd(Fh[1,]) ,rate=8000)
#play(.1*Fh[2,]/sd(Fh[2,]) ,rate=8000)

Independent component analysis (ICA)

The goal with the audio data set is, in fact, quite different. There are two audio sources, the piano and the radio, and there are two microphones recording the data, one closer to the piano and the other closer to the radio. The data can be thought as \[ \mathbf{Y}_t = A \mathbf{Z}_t + \varepsilon_t, \] where \(\mathbf{Z}_t\) is a \(2\)-vector time series representing the two audio sources (say standardized), \(A\) is a \(2\times 2\) matrix reflecting the distance of the sources to the microphones, and \(\varepsilon_t\) is \(\mbox{WN}(0,\Sigma_\varepsilon)\). The component series \(Z_{1t}\) and \(Z_{2t}\) can be assumed to be uncorrelated/independent. The goal is to estimate the underlying uncorrelated/independent component series \(Z_{1t}\) and \(Z_{2t}\).

One possibility might seem to decorrelate \(\mathbf{Y}_t\). As in DFM estimation, Step 1, consider the sample covariance matrix and its decomposition \[ \widehat \Sigma_{\mathbf Y} = \frac{1}{T} \sum_{t=1}^T \mathbf{Y}_t \mathbf{Y}_t' = \widehat U \widehat D \widehat U', \] where \(\widehat D = \mbox{diag}(\widehat d_1,d_2)\) with \(\widehat d_1\geq \widehat d_2\), and \(\widehat U = (\widehat U_1,\widehat U_2)\) is the orthogonal eigenvector matrix such that \(\widehat U \widehat U' = \widehat U' \widehat U = I_2\). Estimate the underlying sources as \[ \widehat{\mathbf{Z}}_t = \widehat \Sigma_{\mathbf Y}^{-1/2} \mathbf{Y}_t \] (which is implemented below through SVD of the data). Would this work and when?

sYa <- svd(Ya)
UV <- sYa$u %*% t(sYa$v)


# play(.1*UV[,1]/sd(UV[,1]) ,rate=8000)
# play(.1*UV[,2]/sd(UV[,2]) ,rate=8000)


par(mfrow=c(1,2)) 
plot(UV[1000:1500,1],type="l")
plot(UV[1000:1500,2],type="l")

But …

theta <- pi/4
R <- matrix(c(cos(theta), sin(theta), -sin(theta), cos(theta)), 2, 2)

Yatheta <- Ya %*% R
#play(.1*Yatheta[,1]/sd(Yatheta[,1]) ,rate=8000)
#play(.1*Yatheta[,2]/sd(Yatheta[,2]) ,rate=8000)

sYatheta <- svd(Yatheta)
UVtheta <- sYatheta$u %*% t(sYatheta$v)


# play(.1*UVtheta[,1]/sd(UVtheta[,1]) ,rate=8000)
# play(.1*UVtheta[,2]/sd(UVtheta[,2]) ,rate=8000)



par(mfrow=c(1,2)) 
plot(UVtheta[1000:1500,1],type="l")
plot(UVtheta[1000:1500,2],type="l")

Independent component analysis, on the other hand, seems to do a nice job …

library(fastICA)

YaICA <- fastICA(Ya, 2, alg.typ = "deflation", fun = "logcosh", alpha = 1,
                 method = "R", row.norm = FALSE, maxit = 200,
                 tol = 0.0001, verbose = TRUE)


#play(.1*YaICA$S[,1]/sd(YaICA$S[,1]) ,rate=8000)
#play(.1*YaICA$S[,2]/sd(YaICA$S[,2]) ,rate=8000)

par(mfrow=c(1,2)) 
plot(YaICA$S[1000:1500,1],type="l")
plot(YaICA$S[1000:1500,2],type="l")

YaICAtheta <- fastICA(Yatheta, 2, alg.typ = "deflation", fun = "logcosh", alpha = 1,
                 method = "R", row.norm = FALSE, maxit = 200,
                 tol = 0.0001, verbose = TRUE)

#play(.1*YaICAtheta$S[,1]/sd(YaICAtheta$S[,1]) ,rate=8000)
#play(.1*YaICAtheta$S[,2]/sd(YaICAtheta$S[,2]) ,rate=8000)


par(mfrow=c(1,2)) 
plot(YaICAtheta$S[1000:1500,1],type="l")
plot(YaICAtheta$S[1000:1500,2],type="l")

Basics of fast independent component analysis (fast ICA)

A simplified setting is \[ \mathbf{Y} = A \mathbf{Z}, \] where \(\mathbf{Y}\) is a random \(q\)-vector, \(A\) is a \(q\times r\) mixing matrix (with \(q\geq r\)) and the \(r\) components of \(\mathbf{Z}\) are independent, non-Gaussian.

In the case of “first” factor, one looks for a “direction” (\(q\)-vector) \(\mathbf{w}\) such that \(\mathbf{w}'\mathbf{Y}\) is most non-Gaussian, which in practice is implemented by maximizing the following non-Gaussianity measure \[ J(\mathbf{w}) = | E G(\mathbf{w}'\mathbf{Y}) - E G({\cal N}(0,1)) |^2, \] where \(G(u) = \frac{1}{\alpha}\log \mbox{cosh} (\alpha u)\) or \(G(u) = - \exp(u^2/2)\). (Why does this make sense?)

The numerical optimization algorithm is described in Hyvarinen and Oja (2000). It is an iterative algorithm that uses an initial random vector \(\mathbf{w}_0\) of unit norm. To get other factors, the algorithm is repeated \(r\) times and the Gramm-Schmitt-like decorrelation is applied to have the estimated factors to be (numerically) uncorrelated.

Some final notes

Bai and Ng (2008), Stock and Watson (2011) are two nice review articles on DFMs. My own recent involvement was to develop a “periodic” version of DFM to use with data exhibiting cyclical variations (Baek et al. (2017)).
ICA is a general purpose method of signal processing and data analysis with interesting applications in a range of areas, for example, feature extraction in image analysis, identifying underlying sources of brain activity in neuroscience, etc. Reading the review paper of Hyvarinen and Oja (2000) is highly recommended.
A demo of blind source separation can be found here.

Questions/Homework

Q0: Supplement your analysis of multivariate time series with DFM or ICA if seems appropriate.

Q1: State and discuss the conditions used in the literature needed for the above DFM estimation method to work.

Q2: What is the rotation that the above DFM estimation method estimates the loadings and the factors up to?

Q3: How is estimation carried out for DFM in the dynamic form?

Q4: Explain why the above methods to estimate the number of factors in DFM are expected to work?

Q5: In connection to ICA, discuss other measures of non-Gaussianity such as kurtosis, and negentropy and its approximations.

Q6: Describe connections of ICA to mutual information, MLE and projection pursuit.

References

Baek, C., R. A. Davis, and V. Pipiras. 2017. “Periodic Dynamic Factor Models.” Preprint.

Bai, J., and S. Ng. 2007. “Determining the Number of Primitive Shocks in Factor Models.” Journal of Business & Economic Statistics 25 (1).

———. 2008. Large Dimensional Factor Analysis. Now Publishers Inc.

Doz, C., D. Giannone, and L. Reichlin. 2011. “A Two-Step Estimator for Large Approximate Dynamic Factor Models Based on Kalman Filtering.” Journal of Econometrics 164: 188–205.

Hyvärinen, A., and E. Oja. 2000. “Independent Component Analysis: Algorithms and Applications.” Neural Networks 13 (4): 411–30.

Onatski, A. 2009. “Testing Hypotheses About the Number of Factors in Large Factor Models.” Econometrica 77 (5): 1447–79.

Stock, J. H., and M. W. Watson. 2008. “Forecasting in Dynamic Factor Models Subject to Structural Instability.” Preprint.

———. 2011. “Dynamic Factor Models.” Oxford Handbook of Economic Forecasting. Oxford University Press, USA, 35–59.