Processing math: 90%
  • What is this all about?
  • Motivating example: Lake Huron data
  • Data/Theory goals of the lecture
  • General discussion on classical decomposition
  • Estimating/Removing trends and seasonality
  • Stationary time series models
  • Fitting AR models and choosing their order
  • Model diagnostics
  • Forecasting
  • Some final notes
  • Questions/Homework
  • References

What is this all about?

Suppose given a univariate time series data xt, t=1,,T, where t refers to time and T to the sample size. We would like to have a statistical/probability model {Xt}tZ for the series, where Xt’s are random variables. Having a statistical model, we can use it for forecasting (prediction), which is one of the basic tasks of time series analysis.

Note: We shall focus on the situations where there is some dependence across times t. Typically, there should be some “physical” reason for temporal dependence. Examples?

This lecture is about a classical approach to time series modeling based on the decomposition (possibly after a preliminary transformation): Xt=Tt+St+Yt,tZ, where {Tt}tZ is a (typically deterministic) model for trend, {St}tZ is a (typically deterministic periodic) model for seasonal variations, and {Yt}tZ is a stationary time series model.

Motivating example: Lake Huron data

Annual measurements of the level, in feet, of Lake Huron 1875–1972.

data(LakeHuron)
ts <- LakeHuron
ts
## Time Series:
## Start = 1875 
## End = 1972 
## Frequency = 1 
##  [1] 580.38 581.86 580.97 580.80 579.79 580.39 580.42 580.82 581.40 581.32
## [11] 581.44 581.68 581.17 580.53 580.01 579.91 579.14 579.16 579.55 579.67
## [21] 578.44 578.24 579.10 579.09 579.35 578.82 579.32 579.01 579.00 579.80
## [31] 579.83 579.72 579.89 580.01 579.37 578.69 578.19 578.67 579.55 578.92
## [41] 578.09 579.37 580.13 580.14 579.51 579.24 578.66 578.86 578.05 577.79
## [51] 576.75 576.75 577.82 578.64 580.58 579.48 577.38 576.90 576.94 576.24
## [61] 576.84 576.85 576.90 577.79 578.18 577.51 577.23 578.42 579.61 579.05
## [71] 579.26 579.22 579.38 579.10 577.95 578.12 579.75 580.85 580.41 579.96
## [81] 579.61 578.76 578.18 577.21 577.13 579.10 578.25 577.91 576.89 575.96
## [91] 576.80 577.68 578.38 578.52 579.74 579.31 579.89 579.96
par(mfrow = c(1, 2)) 
plot.ts(ts,ylab = 'Depth in Feet',xlab='Year')
lts <- length(ts)
plot(y=ts[2:lts],x=ts[1:(lts-1)],ylab = 'Depth in feet',xlab='Previous Depth in Feet')

cor(ts[2:lts],ts[1:(lts-1)])
## [1] 0.8388905

A possibility is also to remove a linear trend.

years <- time(LakeHuron)
lm.LakeHuron <- lm(LakeHuron ~ years)
summary(lm.LakeHuron)
## 
## Call:
## lm(formula = LakeHuron ~ years)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.50997 -0.72726  0.00083  0.74402  2.53565 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 625.554918   7.764293  80.568  < 2e-16 ***
## years        -0.024201   0.004036  -5.996 3.55e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.13 on 96 degrees of freedom
## Multiple R-squared:  0.2725, Adjusted R-squared:  0.2649 
## F-statistic: 35.95 on 1 and 96 DF,  p-value: 3.545e-08
ts2 <- resid(lm.LakeHuron)
ts2 <- ts(ts2,start=c(1875),end=c(1972),frequency=1)

par(mfrow = c(1, 2)) 
plot(ts2,ylab = 'Feet',xlab='Year')
lts2 <- length(ts2)
plot(y=ts2[2:lts2],x=ts2[1:(lts2-1)],ylab = 'Depth in feet',xlab='Previous Depth in Feet')

cor(ts2[2:lts2],ts2[1:(lts2-1)])
## [1] 0.7763291
lm.LakeHuron2 <- lm(ts2[2:lts2] ~ ts2[1:(lts2-1)])
summary(lm.LakeHuron2)
## 
## Call:
## lm(formula = ts2[2:lts2] ~ ts2[1:(lts2 - 1)])
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.95881 -0.49932  0.00171  0.41780  1.89561 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        0.01529    0.07272    0.21    0.834    
## ts2[1:(lts2 - 1)]  0.79112    0.06590   12.00   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7161 on 95 degrees of freedom
## Multiple R-squared:  0.6027, Adjusted R-squared:  0.5985 
## F-statistic: 144.1 on 1 and 95 DF,  p-value: < 2.2e-16
ts3 <- resid(lm.LakeHuron2)

par(mfrow = c(1, 2)) 
plot(ts3, type="l")
lts3 <- length(ts3)
plot(y=ts3[2:lts3],x=ts3[1:(lts3-1)],ylab = 'ts3 current',xlab='ts3 previous')

cor(ts3[2:lts3],ts3[1:(lts3-1)])
## [1] 0.2183583
Possible conclusion:

Use the model Xt=a+ϕXt1+Zt or Xt=a+bt+Yt with Yt=ϕYt1+Zt, where Zt’s are uncorrelated across time t?

Data/Theory goals of the lecture

In fact, after fitting a linear trend Xt=a+bt+Yt, we will choose the model Yt=ϕ1Yt1+ϕ2Yt2+Zt for the residuals and use it in forecasting to arrive at the following forecasting plot:

By the end of the lecture, you should understand how the model is fitted and where such a forecasting plot comes from. On the way, a number of important notions of time series analysis, such as stationarity, some stationary models, ACF, partial ACF, and others, will also be introduced and discussed.

Note: Though it may sometimes appear “simple”, the material of this lecture is most important. For example, it lays foundation for whatever will be done later in the course. Even when dealing with multivariate or high-dimensional or nonlinear or … time series, the univariate series modeling based on the introduced models is always the baseline that is to be improved upon. The lecture will also give you an idea of the mathematical level of the course.

General discussion on classical decomposition

Stationary time series models

After removing trend and seasonal components (if necessary), much of the focus of the univariate time series analysis is on modeling stationary time series.

Definition 1.1: A time series (model) X={Xt}tZ is called stationary if EX2t< for all tZ, EXt=m,for all tZ, that is, it has a constant mean, and Cov(Xt,Xt+h)=E(XtEXt)(Xt+hEXt+h)=EXtXt+hm2=γX(h),for all t,hZ, for a function γX, called the autocovariance function (ACVF) of X, that is, its covariance between Xt and Xt+h depends only on the lag h.

Note: This is the notion of weak stationarity. It is implied by the so-called strict stationarity, with the converse implication holding in general only for Gaussian series.

A notion related to ACVF is that of autocorrelation function (ACF) of a stationary series X defined as Corr(Xt,Xt+h)=Cov(Xt,Xt+h)Var(Xt)Var(Xt+h)=γX(h)γX(0)=:ρX(h),for all t,hZ, The ACF ρX(h) measures the correlation in time series X at lag h. It satisfies ρX(0)=1, |ρX(h)|1, ρX(h)=ρX(h) and it is non-negative definite.

Sample quantities:

The sample ACVF of x1,,xT is defined as ˆγ(h)=1TTht=1(xt+h¯x)(xt¯x) for h=0,1,,T1, and ˆγ(h)=ˆγ(h) for h=(T1),,1, where ¯x=1TTt=1xt is the sample mean. The sample ACF of x1,,xT is defined as ˆρ(h)=ˆγ(h)ˆγ(0),h=(T1),,T1. The plot of ˆρ(h) against h=0,,M is called a correlogram, and is used as a graphical tool in choosing a model.

Example 1.1 (White noise): A stationary series Z={Zt}tZ with EZt=0 and γZ(h)={EZ2t=σ2Z,h=0,0,h0, is called white noise. The notation is {Zt}WN(0,σ2Z). Any series of iid random variables with finite variance is white noise, in which case one writes {Zt}IID(0,σ2Z).

wn1 <- rnorm(100, mean=0, sd=1)
par(mfrow = c(1, 2)) 
plot(wn1, type="l")
acf(wn1, lag.max=20, ylim=c(-1, 1))

wn2 <- rpois(100,lambda=1)
par(mfrow = c(1, 2)) 
plot(wn2, type="l")
acf(wn2, lag.max=20, ylim=c(-1, 1))

Example 1.2 (Autoregressive series): A stationary series X={Xt}tZ with EXt=0 is called autoregressive of order p (AR(p), for short) if it satisfies the AR equation: Xt=ϕ1Xt1+ϕ2Xt2++ϕpXtp+Zt, where {Zt}WN(0,σ2Z). The AR model can be conveniently written using the backward shift operator B, defined as BkXt=Xtk,t,kZ. The AR equation then reads ϕ(B)Xt=Zt, where ϕ(z)=1ϕ1zϕpzp is referred to as the AR polynomial.

Much is known about AR series. For example:

  • The AR equation has a unique stationary solution if and only if ϕ(z)0 for |z|=1 (that is, the polynomial ϕ(z) has no roots on the unit circle). Moreover, the stationary solution is “causal” in terms of {Zt} and identifiable when ϕ(z)0 for |z|<1.

  • There are methods to compute explicitly its ACVF and ACF. For example, for AR(1) series, γX(h)=σ2Zϕ|h|11ϕ21,ρX(h)=ϕ|h|1. For AR(2) series, when ξ1=reiθ and ξ2=reiθ are two complex roots of the polynomial ϕ(z)=1ϕ1zϕ2z2 with r>1 and 0<θ<π, γX(h)=C1(r,θ)r|h|sin(θ|h|+C2(r,θ)).

  • Quite different behaviors could be captured already using AR(1) and AR(2) series. AR(1) with 0<ϕ1<1: positively correlated time series; AR(1) with 1<ϕ1<0: negatively correlated time series; AR(2) series with two complex roots as above: series exhibiting cyclical behavior.

Remark: AR(p) series are special cases of the more general ARMA(p,q) time series satisfying the equation Xt=ϕ1Xt1+ϕ2Xt2++ϕpXtp+Zt+θ1Zt1+...+θqZtq. (AR model is obtained when q=0.) But AR series are flexible enough to capture any correlation structure. They are also ARIMA(p,d,q), defined later, with d=0.

ar1_1 <- arima.sim(list(order = c(1,0,0), ar = 0.7), n = 100)
par(mfrow = c(1, 2)) 
plot(ar1_1, type="l")
acf(ar1_1, lag.max=20, ylim=c(-1, 1))

ar1_2 <- arima.sim(list(order = c(1,0,0), ar = -0.7), n = 100)
par(mfrow = c(1, 2)) 
plot(ar1_2, type="l")
acf(ar1_2, lag.max=20, ylim=c(-1, 1))

# r = 1.1, theta = pi/8 so that phi_1 = xi_1^(-1)+ xi_2^(-1) = 2*(1/r)*cos(theta), phi_2 = -xi_1^(-1)xi_2^(-1) = - 1/r^2

phi1 <- 2*(1/1.1)*cos(pi/8)
phi2 <- -1/1.1^2
ar2 <- arima.sim(list(order = c(2,0,0), ar = c(phi1,phi2) ), n = 200)
par(mfrow = c(1, 2)) 
plot(ar2, type="l")
acf(ar2, lag.max=40, ylim=c(-1, 1))

Fitting AR models and choosing their order

There are a number of ways to fit a stationary parametric model to a time series: e.g. Gaussian ML estimation through a fast calculation of the likelihood; (non-)linear least squares; moment estimation; and others. For AR model with fixed p, all these methods essentially correspond to OLS estimation. For more general parametric families, the methods are not equivalent asymptotically and ML is most efficient.

ar1.model <- arima(ts2, order=c(1,0,0), include.mean = FALSE, method = "ML")
ar1.model
## 
## Call:
## arima(x = ts2, order = c(1, 0, 0), include.mean = FALSE, method = "ML")
## 
## Coefficients:
##          ar1
##       0.7826
## s.e.  0.0635
## 
## sigma^2 estimated as 0.4975:  log likelihood = -105.32,  aic = 214.65
ar1.model <- arima(ts2, order=c(1,0,0), include.mean = FALSE, method = "CSS")
ar1.model
## 
## Call:
## arima(x = ts2, order = c(1, 0, 0), include.mean = FALSE, method = "CSS")
## 
## Coefficients:
##          ar1
##       0.7909
## s.e.  0.0649
## 
## sigma^2 estimated as 0.5024:  part log likelihood = -105.33
lm.LakeHuron2 <- lm(ts2[2:lts2] ~ ts2[1:(lts2-1)] -1)
summary(lm.LakeHuron2)
## 
## Call:
## lm(formula = ts2[2:lts2] ~ ts2[1:(lts2 - 1)] - 1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.94335 -0.48386  0.01758  0.43251  1.91083 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## ts2[1:(lts2 - 1)]  0.79084    0.06556   12.06   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7125 on 96 degrees of freedom
## Multiple R-squared:  0.6025, Adjusted R-squared:  0.5984 
## F-statistic: 145.5 on 1 and 96 DF,  p-value: < 2.2e-16
ar2.model <- arima(ts2, order=c(2,0,0), include.mean = FALSE, method = "ML")
ar2.model
## 
## Call:
## arima(x = ts2, order = c(2, 0, 0), include.mean = FALSE, method = "ML")
## 
## Coefficients:
##          ar1      ar2
##       1.0050  -0.2925
## s.e.  0.0976   0.1002
## 
## sigma^2 estimated as 0.4572:  log likelihood = -101.26,  aic = 208.51
ar3.model <- arima(ts2, order=c(3,0,0), include.mean = FALSE, method = "ML")
ar3.model
## 
## Call:
## arima(x = ts2, order = c(3, 0, 0), include.mean = FALSE, method = "ML")
## 
## Coefficients:
##          ar1      ar2     ar3
##       1.0240  -0.3566  0.0639
## s.e.  0.1023   0.1451  0.1047
## 
## sigma^2 estimated as 0.4554:  log likelihood = -101.07,  aic = 210.14
arma11.model <- arima(ts2, order=c(1,0,1), include.mean = FALSE, method = "ML")
arma11.model
## 
## Call:
## arima(x = ts2, order = c(1, 0, 1), include.mean = FALSE, method = "ML")
## 
## Coefficients:
##          ar1     ma1
##       0.6513  0.3577
## s.e.  0.0945  0.1148
## 
## sigma^2 estimated as 0.4573:  log likelihood = -101.27,  aic = 208.53

Confidence intervals: How are the s.e. and confidence intervals computed? What is different from the usual regression?

Order selection:

The order p of AR model (more generally, the orders p,q of ARMA model) can be chosen through:

  • Using some information criterion (e.g. AIC, BIC, etc)
  • Partial ACF
  • Testing, e.g. likelihood ratio test
  • Cross validation

Using an information criterion: An information criterion chooses p that minimizes 2logL+r(T,p), where L is the likelihood (evaluated at ML estimates) and r(T,p) is a positive penalty depending on the sample size T and the order p (the larger the order/the number of parameters, the larger the penalty; likewise, the smaller the negative log-likelihood). The following penalties correspond to the listed criteria: TlogT+pTp:FPE2(p+1):AIC2(p+1)TTp2:AICClogT(p+1):BIC

Notes:

  • One has 2logL=C+Tlogˆσ2, where one can think of ˆσ2 as the sum of the residuals squared and C depends only on the sample size T.

  • AIC is known to have a tendency to choose larger p (overfit the model).

library(forecast)

auto.arima(ts2,max.p=5,max.q=5,ic="aic",allowmean = FALSE)
## Series: ts2 
## ARIMA(1,0,1) with zero mean     
## 
## Coefficients:
##          ar1     ma1
##       0.6513  0.3577
## s.e.  0.0945  0.1148
## 
## sigma^2 estimated as 0.4668:  log likelihood=-101.27
## AIC=208.53   AICc=208.79   BIC=216.29
auto.arima(ts2,max.p=5,max.q=5,ic="bic",allowmean  = FALSE)
## Series: ts2 
## ARIMA(2,0,0) with zero mean     
## 
## Coefficients:
##          ar1      ar2
##       1.0050  -0.2925
## s.e.  0.0976   0.1002
## 
## sigma^2 estimated as 0.4667:  log likelihood=-101.26
## AIC=208.51   AICc=208.77   BIC=216.27
auto.arima(ts2,max.p=5,max.q=0,ic="aic",allowmean  = FALSE)
## Series: ts2 
## ARIMA(2,0,0) with zero mean     
## 
## Coefficients:
##          ar1      ar2
##       1.0050  -0.2925
## s.e.  0.0976   0.1002
## 
## sigma^2 estimated as 0.4667:  log likelihood=-101.26
## AIC=208.51   AICc=208.77   BIC=216.27
auto.arima(ts2,max.p=5,max.q=0,ic="bic",allowmean  = FALSE)
## Series: ts2 
## ARIMA(2,0,0) with zero mean     
## 
## Coefficients:
##          ar1      ar2
##       1.0050  -0.2925
## s.e.  0.0976   0.1002
## 
## sigma^2 estimated as 0.4667:  log likelihood=-101.26
## AIC=208.51   AICc=208.77   BIC=216.27

Using partial autocorrelation function:

Definition 1.2: The partial ACF (PACF) of X={Xt}tZ is defined as αX(0)=1,αX(h)=ϕhh,h1, where ϕhh is the coefficient at X1 in a linear predictor of Xh+1, that is, ˆXh+1=ϕh0+ϕh1Xh++ϕhhX1.

The sample PACF ˆα(h), h1, is defined as the coefficient ˆϕhh in the regression of xt on xt1,,xth, that is, xt=ˆϕh0+ˆϕh1xt1++ˆϕhhxth+ˆzt.

Note: There are other equivalent definitions, e.g. using the Frish-Waugh-Lovell theorem.

Note: For AR(p) series, it is known that αX(h)={1,h=0,ρX(1),h=1,...,h=2,,p1,ϕp,h=p,0,hp+1.

par(mfrow = c(1, 2)) 
plot(ts2,ylab = 'Feet',xlab='Year')
pacf(ts2, lag.max=20, ylim=c(-1, 1))

Testing (e.g. likelihood ratio): Use the fact that 2logL(p)+2logL(p+1)=2log(L(p)L(p+1))χ2(1),

under the hypothesis that the reduced AR(p) model should be preferred.

qchisq(.95, df=1) 
## [1] 3.841459
qchisq(.975, df=1) 
## [1] 5.023886
qchisq(.999, df=1) 
## [1] 10.82757
ar1.model <- arima(ts2, order=c(1,0,0), include.mean = FALSE, method = "ML")
ar2.model <- arima(ts2, order=c(2,0,0), include.mean = FALSE, method = "ML")
ar3.model <- arima(ts2, order=c(3,0,0), include.mean = FALSE, method = "ML")

t12 <- -2*ar1.model$loglik + 2*ar2.model$loglik
t12
## [1] 8.136994
t23 <- -2*ar2.model$loglik + 2*ar3.model$loglik
t23
## [1] 0.3712609

Using cross validation: Write an AR(p) model as a linear regression Yt=Φ(p)Xt(p)+ϵt with Yt=Xt and Φ(p)=(ϕ1 ϕ2 ϕp) and Xt(p)=(Xt1 Xt2Xtp). Choose p as argmin where CV(p) is the k-fold cross validation error for the AR(p) model. What does the latter mean? (Typical choice is k=10.)

Model diagnostics

As with any regression model, a model fit should be examined through the model diagnostics.

resid.ar2 <- resid(ar2.model)
plot(resid.ar2,ylab = 'Residuals',xlab='Year')

par(mfrow = c(1, 2)) 
acf(resid.ar2, lag.max=20, ylim=c(-1, 1))
pacf(resid.ar2, lag.max=20, ylim=c(-1, 1))

qqnorm(resid.ar2, main="")
qqline(resid.ar2)

Box.test(resid.ar2, lag=32, type="Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  resid.ar2
## X-squared = 15.897, df = 32, p-value = 0.9922

Forecasting

How does one forecast based on an AR(p) model? What is the forecast error?

h = 5
ts2.forecast <- predict(ar2.model, h)
round(ts2.forecast$pred, 3)
## Time Series:
## Start = 1973 
## End = 1977 
## Frequency = 1 
## [1] 1.545 0.930 0.483 0.213 0.073
round(ts2.forecast$se, 3)
## Time Series:
## Start = 1973 
## End = 1977 
## Frequency = 1 
## [1] 0.676 0.959 1.074 1.113 1.122
plus.or.minus <- 1.96 * ts2.forecast$se
lower <- ts2.forecast$pred - plus.or.minus
upper <- ts2.forecast$pred + plus.or.minus

fyears = (1973:1977)

plot(ts2, xlim=c(1875, 1977))
lines(fyears, ts2.forecast$pred, lwd=2)
lines(fyears, lower, lty=2, lwd=2)
lines(fyears, upper, lty=2, lwd=2)
title("Forecasts for the detrended time series")

trend.forecast <- lm.LakeHuron$coefficients[1] + lm.LakeHuron$coefficients[2]*fyears
ts.forecast.pred <- trend.forecast + ts2.forecast$pred 
lower = ts.forecast.pred - plus.or.minus
upper = ts.forecast.pred + plus.or.minus

plot(ts, xlim=c(1875, 1977))
lines(fyears, ts.forecast.pred, lwd=2)
lines(fyears, lower, lty=2, lwd=2)
lines(fyears, upper, lty=2, lwd=2)
title("Forecasts for the time series")

Some final notes

  • Preliminary transforms: The Box-Cox transformation of x_t: f_\lambda(x_t) = \left\{ \begin{array}{cl} \frac{x_t^\lambda - 1}{\lambda}, & x_t>0,\ \lambda>0,\\ \log x_t, & x_t>0,\ \lambda=0. \end{array} \right. It can also be applied to x_t+C, especially when needing to have x_t+C>0. For example: Quarterly earnings (dollars) per Johnson and Johnson share 1960–80:
data("JohnsonJohnson")
par(mfrow = c(1, 2)) 
plot.ts(JohnsonJohnson)
plot.ts(log(JohnsonJohnson))

  • The approach to time series described above was popularized by George Box and Gwilym Jenkins in the 1970’s; see e.g. Box and Jenkins (1970) (the most recent edition is (Box et al. 2016)). It is now the basis of all introductory books on time series analysis, including:
    • Brockwell and Davis (2016). The more advanced, graduate version is Brockwell and Davis (2006).
    • Shumway and Stoffer (2011). It has nice accompanying examples and code in R.
    • Cowpertwait and Metcalfe (2009); Montgomery et al. (2008); and many others (in particular, in applied fields).
  • Various R packages for time series analysis are discussed in McLeod et al. (2011). A number of packages will be used as needed in the future lectures.

  • Many packages also allow to include in AR models other exogenous variables. E.g. the arima function used above allows this.

Questions/Homework

Q1: Apply the methodology outlined above to obtain a classical decomposition model and use it in forecasting for some real time series.

Q2: What is the idea of some of the model selection criteria, and why do they work for time series models?

Q3: How does one set confidence intervals for the mean \mu = E X_t, the autocovariance \gamma(h) (including the variance when h=0) and the autocorrelation \rho(h) for fixed h?

Q4: Explain the various estimation methods “ML”, “CSS”, etc, referred to above. Are there any numerical pitfalls in performing estimation?

Q5: In what sense are AR models flexible enough to capture almost any correlation structure in stationary time series?

Q6: Discuss examples of parametric stationary models where different estimation methods (ML, moment estimation, etc.) have different asymptotic efficiency.

References

Box, G. E. P., and G. M. Jenkins. 1970. Times Series Analysis. Forecasting and Control. Holden-Day, San Francisco, Calif.-London-Amsterdam.

Box, G. E. P., G. M. Jenkins, G. C. Reinsel, and G. M. Ljung. 2016. Time Series Analysis. Fifth. Wiley Series in Probability and Statistics. John Wiley & Sons, Inc., Hoboken, NJ.

Brockwell, P. J., and R. A. Davis. 2006. Time Series: Theory and Methods. Springer Series in Statistics. Springer, New York.

———. 2016. Introduction to Time Series and Forecasting. Third. Springer Texts in Statistics. Springer.

Cowpertwait, P. S. P., and A. V. Metcalfe. 2009. Introductory Time Series with R. Springer.

McLeod, A. I., H. Yu, and E. Mahdi. 2011. “Time Series Analysis with R.” Handbook of Statistics 30. Elsevier: 78–93.

Montgomery, D. C., C. L. Jennings, and M. Kulahci. 2008. Introduction to Time Series Analysis and Forecasting. Wiley Series in Probability and Statistics. Wiley-Interscience [John Wiley & Sons], Hoboken, NJ.

Shumway, R. H., and D. S. Stoffer. 2011. Time Series Analysis and Its Applications. Third. Springer Texts in Statistics. Springer, New York.