Suppose given a univariate time series data xt, t=1,…,T, where t refers to time and T to the sample size. We would like to have a statistical/probability model {Xt}t∈Z for the series, where Xt’s are random variables. Having a statistical model, we can use it for forecasting (prediction), which is one of the basic tasks of time series analysis.
Note: We shall focus on the situations where there is some dependence across times t. Typically, there should be some “physical” reason for temporal dependence. Examples?
This lecture is about a classical approach to time series modeling based on the decomposition (possibly after a preliminary transformation): Xt=Tt+St+Yt,t∈Z, where {Tt}t∈Z is a (typically deterministic) model for trend, {St}t∈Z is a (typically deterministic periodic) model for seasonal variations, and {Yt}t∈Z is a stationary time series model.
Annual measurements of the level, in feet, of Lake Huron 1875–1972.
data(LakeHuron)
ts <- LakeHuron
ts
## Time Series:
## Start = 1875
## End = 1972
## Frequency = 1
## [1] 580.38 581.86 580.97 580.80 579.79 580.39 580.42 580.82 581.40 581.32
## [11] 581.44 581.68 581.17 580.53 580.01 579.91 579.14 579.16 579.55 579.67
## [21] 578.44 578.24 579.10 579.09 579.35 578.82 579.32 579.01 579.00 579.80
## [31] 579.83 579.72 579.89 580.01 579.37 578.69 578.19 578.67 579.55 578.92
## [41] 578.09 579.37 580.13 580.14 579.51 579.24 578.66 578.86 578.05 577.79
## [51] 576.75 576.75 577.82 578.64 580.58 579.48 577.38 576.90 576.94 576.24
## [61] 576.84 576.85 576.90 577.79 578.18 577.51 577.23 578.42 579.61 579.05
## [71] 579.26 579.22 579.38 579.10 577.95 578.12 579.75 580.85 580.41 579.96
## [81] 579.61 578.76 578.18 577.21 577.13 579.10 578.25 577.91 576.89 575.96
## [91] 576.80 577.68 578.38 578.52 579.74 579.31 579.89 579.96
par(mfrow = c(1, 2))
plot.ts(ts,ylab = 'Depth in Feet',xlab='Year')
lts <- length(ts)
plot(y=ts[2:lts],x=ts[1:(lts-1)],ylab = 'Depth in feet',xlab='Previous Depth in Feet')
cor(ts[2:lts],ts[1:(lts-1)])
## [1] 0.8388905
A possibility is also to remove a linear trend.
years <- time(LakeHuron)
lm.LakeHuron <- lm(LakeHuron ~ years)
summary(lm.LakeHuron)
##
## Call:
## lm(formula = LakeHuron ~ years)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.50997 -0.72726 0.00083 0.74402 2.53565
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 625.554918 7.764293 80.568 < 2e-16 ***
## years -0.024201 0.004036 -5.996 3.55e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.13 on 96 degrees of freedom
## Multiple R-squared: 0.2725, Adjusted R-squared: 0.2649
## F-statistic: 35.95 on 1 and 96 DF, p-value: 3.545e-08
ts2 <- resid(lm.LakeHuron)
ts2 <- ts(ts2,start=c(1875),end=c(1972),frequency=1)
par(mfrow = c(1, 2))
plot(ts2,ylab = 'Feet',xlab='Year')
lts2 <- length(ts2)
plot(y=ts2[2:lts2],x=ts2[1:(lts2-1)],ylab = 'Depth in feet',xlab='Previous Depth in Feet')
cor(ts2[2:lts2],ts2[1:(lts2-1)])
## [1] 0.7763291
lm.LakeHuron2 <- lm(ts2[2:lts2] ~ ts2[1:(lts2-1)])
summary(lm.LakeHuron2)
##
## Call:
## lm(formula = ts2[2:lts2] ~ ts2[1:(lts2 - 1)])
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.95881 -0.49932 0.00171 0.41780 1.89561
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.01529 0.07272 0.21 0.834
## ts2[1:(lts2 - 1)] 0.79112 0.06590 12.00 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7161 on 95 degrees of freedom
## Multiple R-squared: 0.6027, Adjusted R-squared: 0.5985
## F-statistic: 144.1 on 1 and 95 DF, p-value: < 2.2e-16
ts3 <- resid(lm.LakeHuron2)
par(mfrow = c(1, 2))
plot(ts3, type="l")
lts3 <- length(ts3)
plot(y=ts3[2:lts3],x=ts3[1:(lts3-1)],ylab = 'ts3 current',xlab='ts3 previous')
cor(ts3[2:lts3],ts3[1:(lts3-1)])
## [1] 0.2183583
Use the model Xt=a+ϕXt−1+Zt or Xt=a+bt+Yt with Yt=ϕYt−1+Zt, where Zt’s are uncorrelated across time t?
In fact, after fitting a linear trend Xt=a+bt+Yt, we will choose the model Yt=ϕ1Yt−1+ϕ2Yt−2+Zt for the residuals and use it in forecasting to arrive at the following forecasting plot:
By the end of the lecture, you should understand how the model is fitted and where such a forecasting plot comes from. On the way, a number of important notions of time series analysis, such as stationarity, some stationary models, ACF, partial ACF, and others, will also be introduced and discussed.
Note: Though it may sometimes appear “simple”, the material of this lecture is most important. For example, it lays foundation for whatever will be done later in the course. Even when dealing with multivariate or high-dimensional or nonlinear or … time series, the univariate series modeling based on the introduced models is always the baseline that is to be improved upon. The lecture will also give you an idea of the mathematical level of the course.
Several methods:
In particular, a moving average estimate of e.g. a trend is defined as: ˆTt=∞∑j=−∞ajxt+j=…+a−2xt−2+a−1xt−1+a0xt+a1xt+1+a2xt+2+… with a filter a=(aj). Often, aj=a−j (symmetric) and aj=0 for |j|≥J (finite). (An issue is what to do at the “boundary”.) Another basic requirement: ∑jaj=1 so that ∑jajTt+j=Tt for Tt=m. Downside: How does one forecast such trend into the future?
For example: Spencer’s 15-point moving average filter: (a0 a1…a7)=1320(74 67 46 21 3 −5 −6 −3),aj=a−j,aj=0 for j>7. It reproduces polynomials of order 3, that is, ∑jajTt+j=Tt for all Tt=a+bt+ct2+dt3. Creating filters with “good” properties is almost a science in itself, and there are a number of well-known filters for various purposes (e.g. Hodrick-Prescott, Baxter-King, etc).
In the absence of trend, a seasonal component is estimated as: for t=1,…,s, and supposing T=ms, ˆSt=xt+xt+s+xt+2s+…+xt+(m−1)sm and for t>s, is extended periodically by using ˆSt=ˆSt−s. That is, it is estimated as the seasonal mean.
In the presence of trend and seasonal component, a preliminary estimate of trend is often taken as ˆTt={1s(0.5xt−s2+xt−s2+1+…+xt+s2−1+0.5xt+s2),if s is even,1s(xt−s−12+xt−s−12+1+…+xt+s−12−1+xt+s−12),if s is odd The preliminary trend is removed and a seasonal component is estimated. The seasonal component is then removed and a trend is reestimated e.g. through some other method. This is the idea of the so-called classical decomposition method.
Remarks
Filtering/moving average is one of the most basic and important operations in time series analysis. E.g. Application of two filters a and b sequentially is the same as applying one filter which is the convolution of the two filters a∗b, defined as (a∗b)k=∑jajbk−j.
Depending on the application, the trend or seasonal component may drive much of the variability, and hence be the only components of practical interest. Thus, the science of creating filters should not be surprising.
Filtering and various filters are best understood in the so-called spectral domain. Spectral perspective of time series will be discussed in another lecture.
Different methods will lead to different models. A particular model may be preferred because: it is (relatively) simpler, has a more conventional fit (e.g. the residuals show normality), or provide a better out-of-sample forecasting performance.
After removing trend and seasonal components (if necessary), much of the focus of the univariate time series analysis is on modeling stationary time series.
Definition 1.1: A time series (model) X={Xt}t∈Z is called stationary if EX2t<∞ for all t∈Z, EXt=m,for all t∈Z, that is, it has a constant mean, and Cov(Xt,Xt+h)=E(Xt−EXt)(Xt+h−EXt+h)=EXtXt+h−m2=γX(h),for all t,h∈Z, for a function γX, called the autocovariance function (ACVF) of X, that is, its covariance between Xt and Xt+h depends only on the lag h.
Note: This is the notion of weak stationarity. It is implied by the so-called strict stationarity, with the converse implication holding in general only for Gaussian series.
A notion related to ACVF is that of autocorrelation function (ACF) of a stationary series X defined as Corr(Xt,Xt+h)=Cov(Xt,Xt+h)√Var(Xt)Var(Xt+h)=γX(h)γX(0)=:ρX(h),for all t,h∈Z, The ACF ρX(h) measures the correlation in time series X at lag h. It satisfies ρX(0)=1, |ρX(h)|≤1, ρX(−h)=ρX(h) and it is non-negative definite.
Sample quantities:
The sample ACVF of x1,…,xT is defined as ˆγ(h)=1TT−h∑t=1(xt+h−¯x)(xt−¯x) for h=0,1,…,T−1, and ˆγ(h)=ˆγ(−h) for h=−(T−1),…,−1, where ¯x=1T∑Tt=1xt is the sample mean. The sample ACF of x1,…,xT is defined as ˆρ(h)=ˆγ(h)ˆγ(0),h=−(T−1),…,T−1. The plot of ˆρ(h) against h=0,…,M is called a correlogram, and is used as a graphical tool in choosing a model.
Example 1.1 (White noise): A stationary series Z={Zt}t∈Z with EZt=0 and γZ(h)={EZ2t=σ2Z,h=0,0,h≠0, is called white noise. The notation is {Zt}∼WN(0,σ2Z). Any series of iid random variables with finite variance is white noise, in which case one writes {Zt}∼IID(0,σ2Z).
wn1 <- rnorm(100, mean=0, sd=1)
par(mfrow = c(1, 2))
plot(wn1, type="l")
acf(wn1, lag.max=20, ylim=c(-1, 1))
wn2 <- rpois(100,lambda=1)
par(mfrow = c(1, 2))
plot(wn2, type="l")
acf(wn2, lag.max=20, ylim=c(-1, 1))
Example 1.2 (Autoregressive series): A stationary series X={Xt}t∈Z with EXt=0 is called autoregressive of order p (AR(p), for short) if it satisfies the AR equation: Xt=ϕ1Xt−1+ϕ2Xt−2+…+ϕpXt−p+Zt, where {Zt}∼WN(0,σ2Z). The AR model can be conveniently written using the backward shift operator B, defined as BkXt=Xt−k,t,k∈Z. The AR equation then reads ϕ(B)Xt=Zt, where ϕ(z)=1−ϕ1z−…−ϕpzp is referred to as the AR polynomial.
Much is known about AR series. For example:
The AR equation has a unique stationary solution if and only if ϕ(z)≠0 for |z|=1 (that is, the polynomial ϕ(z) has no roots on the unit circle). Moreover, the stationary solution is “causal” in terms of {Zt} and identifiable when ϕ(z)≠0 for |z|<1.
There are methods to compute explicitly its ACVF and ACF. For example, for AR(1) series, γX(h)=σ2Zϕ|h|11−ϕ21,ρX(h)=ϕ|h|1. For AR(2) series, when ξ1=reiθ and ξ2=re−iθ are two complex roots of the polynomial ϕ(z)=1−ϕ1z−ϕ2z2 with r>1 and 0<θ<π, γX(h)=C1(r,θ)r−|h|sin(θ|h|+C2(r,θ)).
Quite different behaviors could be captured already using AR(1) and AR(2) series. AR(1) with 0<ϕ1<1: positively correlated time series; AR(1) with −1<ϕ1<0: negatively correlated time series; AR(2) series with two complex roots as above: series exhibiting cyclical behavior.
Remark: AR(p) series are special cases of the more general ARMA(p,q) time series satisfying the equation Xt=ϕ1Xt−1+ϕ2Xt−2+…+ϕpXt−p+Zt+θ1Zt−1+...+θqZt−q. (AR model is obtained when q=0.) But AR series are flexible enough to capture any correlation structure. They are also ARIMA(p,d,q), defined later, with d=0.
ar1_1 <- arima.sim(list(order = c(1,0,0), ar = 0.7), n = 100)
par(mfrow = c(1, 2))
plot(ar1_1, type="l")
acf(ar1_1, lag.max=20, ylim=c(-1, 1))
ar1_2 <- arima.sim(list(order = c(1,0,0), ar = -0.7), n = 100)
par(mfrow = c(1, 2))
plot(ar1_2, type="l")
acf(ar1_2, lag.max=20, ylim=c(-1, 1))
# r = 1.1, theta = pi/8 so that phi_1 = xi_1^(-1)+ xi_2^(-1) = 2*(1/r)*cos(theta), phi_2 = -xi_1^(-1)xi_2^(-1) = - 1/r^2
phi1 <- 2*(1/1.1)*cos(pi/8)
phi2 <- -1/1.1^2
ar2 <- arima.sim(list(order = c(2,0,0), ar = c(phi1,phi2) ), n = 200)
par(mfrow = c(1, 2))
plot(ar2, type="l")
acf(ar2, lag.max=40, ylim=c(-1, 1))
There are a number of ways to fit a stationary parametric model to a time series: e.g. Gaussian ML estimation through a fast calculation of the likelihood; (non-)linear least squares; moment estimation; and others. For AR model with fixed p, all these methods essentially correspond to OLS estimation. For more general parametric families, the methods are not equivalent asymptotically and ML is most efficient.
ar1.model <- arima(ts2, order=c(1,0,0), include.mean = FALSE, method = "ML")
ar1.model
##
## Call:
## arima(x = ts2, order = c(1, 0, 0), include.mean = FALSE, method = "ML")
##
## Coefficients:
## ar1
## 0.7826
## s.e. 0.0635
##
## sigma^2 estimated as 0.4975: log likelihood = -105.32, aic = 214.65
ar1.model <- arima(ts2, order=c(1,0,0), include.mean = FALSE, method = "CSS")
ar1.model
##
## Call:
## arima(x = ts2, order = c(1, 0, 0), include.mean = FALSE, method = "CSS")
##
## Coefficients:
## ar1
## 0.7909
## s.e. 0.0649
##
## sigma^2 estimated as 0.5024: part log likelihood = -105.33
lm.LakeHuron2 <- lm(ts2[2:lts2] ~ ts2[1:(lts2-1)] -1)
summary(lm.LakeHuron2)
##
## Call:
## lm(formula = ts2[2:lts2] ~ ts2[1:(lts2 - 1)] - 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.94335 -0.48386 0.01758 0.43251 1.91083
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## ts2[1:(lts2 - 1)] 0.79084 0.06556 12.06 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7125 on 96 degrees of freedom
## Multiple R-squared: 0.6025, Adjusted R-squared: 0.5984
## F-statistic: 145.5 on 1 and 96 DF, p-value: < 2.2e-16
ar2.model <- arima(ts2, order=c(2,0,0), include.mean = FALSE, method = "ML")
ar2.model
##
## Call:
## arima(x = ts2, order = c(2, 0, 0), include.mean = FALSE, method = "ML")
##
## Coefficients:
## ar1 ar2
## 1.0050 -0.2925
## s.e. 0.0976 0.1002
##
## sigma^2 estimated as 0.4572: log likelihood = -101.26, aic = 208.51
ar3.model <- arima(ts2, order=c(3,0,0), include.mean = FALSE, method = "ML")
ar3.model
##
## Call:
## arima(x = ts2, order = c(3, 0, 0), include.mean = FALSE, method = "ML")
##
## Coefficients:
## ar1 ar2 ar3
## 1.0240 -0.3566 0.0639
## s.e. 0.1023 0.1451 0.1047
##
## sigma^2 estimated as 0.4554: log likelihood = -101.07, aic = 210.14
arma11.model <- arima(ts2, order=c(1,0,1), include.mean = FALSE, method = "ML")
arma11.model
##
## Call:
## arima(x = ts2, order = c(1, 0, 1), include.mean = FALSE, method = "ML")
##
## Coefficients:
## ar1 ma1
## 0.6513 0.3577
## s.e. 0.0945 0.1148
##
## sigma^2 estimated as 0.4573: log likelihood = -101.27, aic = 208.53
Confidence intervals: How are the s.e. and confidence intervals computed? What is different from the usual regression?
The order p of AR model (more generally, the orders p,q of ARMA model) can be chosen through:
Using an information criterion: An information criterion chooses p that minimizes −2logL+r(T,p), where L is the likelihood (evaluated at ML estimates) and r(T,p) is a positive penalty depending on the sample size T and the order p (the larger the order/the number of parameters, the larger the penalty; likewise, the smaller the negative log-likelihood). The following penalties correspond to the listed criteria: TlogT+pT−p:FPE2(p+1):AIC2(p+1)TT−p−2:AICClogT(p+1):BIC
Notes:
One has −2logL=C+Tlogˆσ2, where one can think of ˆσ2 as the sum of the residuals squared and C depends only on the sample size T.
AIC is known to have a tendency to choose larger p (overfit the model).
library(forecast)
auto.arima(ts2,max.p=5,max.q=5,ic="aic",allowmean = FALSE)
## Series: ts2
## ARIMA(1,0,1) with zero mean
##
## Coefficients:
## ar1 ma1
## 0.6513 0.3577
## s.e. 0.0945 0.1148
##
## sigma^2 estimated as 0.4668: log likelihood=-101.27
## AIC=208.53 AICc=208.79 BIC=216.29
auto.arima(ts2,max.p=5,max.q=5,ic="bic",allowmean = FALSE)
## Series: ts2
## ARIMA(2,0,0) with zero mean
##
## Coefficients:
## ar1 ar2
## 1.0050 -0.2925
## s.e. 0.0976 0.1002
##
## sigma^2 estimated as 0.4667: log likelihood=-101.26
## AIC=208.51 AICc=208.77 BIC=216.27
auto.arima(ts2,max.p=5,max.q=0,ic="aic",allowmean = FALSE)
## Series: ts2
## ARIMA(2,0,0) with zero mean
##
## Coefficients:
## ar1 ar2
## 1.0050 -0.2925
## s.e. 0.0976 0.1002
##
## sigma^2 estimated as 0.4667: log likelihood=-101.26
## AIC=208.51 AICc=208.77 BIC=216.27
auto.arima(ts2,max.p=5,max.q=0,ic="bic",allowmean = FALSE)
## Series: ts2
## ARIMA(2,0,0) with zero mean
##
## Coefficients:
## ar1 ar2
## 1.0050 -0.2925
## s.e. 0.0976 0.1002
##
## sigma^2 estimated as 0.4667: log likelihood=-101.26
## AIC=208.51 AICc=208.77 BIC=216.27
Using partial autocorrelation function:
Definition 1.2: The partial ACF (PACF) of X={Xt}t∈Z is defined as αX(0)=1,αX(h)=ϕhh,h≥1, where ϕhh is the coefficient at X1 in a linear predictor of Xh+1, that is, ˆXh+1=ϕh0+ϕh1Xh+…+ϕhhX1.
The sample PACF ˆα(h), h≥1, is defined as the coefficient ˆϕhh in the regression of xt on xt−1,…,xt−h, that is, xt=ˆϕh0+ˆϕh1xt−1+…+ˆϕhhxt−h+ˆzt.
Note: There are other equivalent definitions, e.g. using the Frish-Waugh-Lovell theorem.
Note: For AR(p) series, it is known that αX(h)={1,h=0,ρX(1),h=1,...,h=2,…,p−1,ϕp,h=p,0,h≥p+1.
par(mfrow = c(1, 2))
plot(ts2,ylab = 'Feet',xlab='Year')
pacf(ts2, lag.max=20, ylim=c(-1, 1))
Testing (e.g. likelihood ratio): Use the fact that −2logL(p)+2logL(p+1)=−2log(L(p)L(p+1))∼χ2(1),
under the hypothesis that the reduced AR(p) model should be preferred.
qchisq(.95, df=1)
## [1] 3.841459
qchisq(.975, df=1)
## [1] 5.023886
qchisq(.999, df=1)
## [1] 10.82757
ar1.model <- arima(ts2, order=c(1,0,0), include.mean = FALSE, method = "ML")
ar2.model <- arima(ts2, order=c(2,0,0), include.mean = FALSE, method = "ML")
ar3.model <- arima(ts2, order=c(3,0,0), include.mean = FALSE, method = "ML")
t12 <- -2*ar1.model$loglik + 2*ar2.model$loglik
t12
## [1] 8.136994
t23 <- -2*ar2.model$loglik + 2*ar3.model$loglik
t23
## [1] 0.3712609
Using cross validation: Write an AR(p) model as a linear regression Yt=Φ(p)′Xt(p)+ϵt with Yt=Xt and Φ(p)=(ϕ1 ϕ2 …ϕp)′ and Xt(p)=(Xt−1 Xt−2…Xt−p)′. Choose p as argmin where CV(p) is the k-fold cross validation error for the AR(p) model. What does the latter mean? (Typical choice is k=10.)
As with any regression model, a model fit should be examined through the model diagnostics.
resid.ar2 <- resid(ar2.model)
plot(resid.ar2,ylab = 'Residuals',xlab='Year')
par(mfrow = c(1, 2))
acf(resid.ar2, lag.max=20, ylim=c(-1, 1))
pacf(resid.ar2, lag.max=20, ylim=c(-1, 1))
qqnorm(resid.ar2, main="")
qqline(resid.ar2)
Box.test(resid.ar2, lag=32, type="Ljung-Box")
##
## Box-Ljung test
##
## data: resid.ar2
## X-squared = 15.897, df = 32, p-value = 0.9922
How does one forecast based on an AR(p) model? What is the forecast error?
h = 5
ts2.forecast <- predict(ar2.model, h)
round(ts2.forecast$pred, 3)
## Time Series:
## Start = 1973
## End = 1977
## Frequency = 1
## [1] 1.545 0.930 0.483 0.213 0.073
round(ts2.forecast$se, 3)
## Time Series:
## Start = 1973
## End = 1977
## Frequency = 1
## [1] 0.676 0.959 1.074 1.113 1.122
plus.or.minus <- 1.96 * ts2.forecast$se
lower <- ts2.forecast$pred - plus.or.minus
upper <- ts2.forecast$pred + plus.or.minus
fyears = (1973:1977)
plot(ts2, xlim=c(1875, 1977))
lines(fyears, ts2.forecast$pred, lwd=2)
lines(fyears, lower, lty=2, lwd=2)
lines(fyears, upper, lty=2, lwd=2)
title("Forecasts for the detrended time series")
trend.forecast <- lm.LakeHuron$coefficients[1] + lm.LakeHuron$coefficients[2]*fyears
ts.forecast.pred <- trend.forecast + ts2.forecast$pred
lower = ts.forecast.pred - plus.or.minus
upper = ts.forecast.pred + plus.or.minus
plot(ts, xlim=c(1875, 1977))
lines(fyears, ts.forecast.pred, lwd=2)
lines(fyears, lower, lty=2, lwd=2)
lines(fyears, upper, lty=2, lwd=2)
title("Forecasts for the time series")
data("JohnsonJohnson")
par(mfrow = c(1, 2))
plot.ts(JohnsonJohnson)
plot.ts(log(JohnsonJohnson))
Various R packages for time series analysis are discussed in McLeod et al. (2011). A number of packages will be used as needed in the future lectures.
Many packages also allow to include in AR models other exogenous variables. E.g. the arima function used above allows this.
Q1: Apply the methodology outlined above to obtain a classical decomposition model and use it in forecasting for some real time series.
Q2: What is the idea of some of the model selection criteria, and why do they work for time series models?
Q3: How does one set confidence intervals for the mean \mu = E X_t, the autocovariance \gamma(h) (including the variance when h=0) and the autocorrelation \rho(h) for fixed h?
Q4: Explain the various estimation methods “ML”, “CSS”, etc, referred to above. Are there any numerical pitfalls in performing estimation?
Q5: In what sense are AR models flexible enough to capture almost any correlation structure in stationary time series?
Q6: Discuss examples of parametric stationary models where different estimation methods (ML, moment estimation, etc.) have different asymptotic efficiency.
Box, G. E. P., and G. M. Jenkins. 1970. Times Series Analysis. Forecasting and Control. Holden-Day, San Francisco, Calif.-London-Amsterdam.
Box, G. E. P., G. M. Jenkins, G. C. Reinsel, and G. M. Ljung. 2016. Time Series Analysis. Fifth. Wiley Series in Probability and Statistics. John Wiley & Sons, Inc., Hoboken, NJ.
Brockwell, P. J., and R. A. Davis. 2006. Time Series: Theory and Methods. Springer Series in Statistics. Springer, New York.
———. 2016. Introduction to Time Series and Forecasting. Third. Springer Texts in Statistics. Springer.
Cowpertwait, P. S. P., and A. V. Metcalfe. 2009. Introductory Time Series with R. Springer.
McLeod, A. I., H. Yu, and E. Mahdi. 2011. “Time Series Analysis with R.” Handbook of Statistics 30. Elsevier: 78–93.
Montgomery, D. C., C. L. Jennings, and M. Kulahci. 2008. Introduction to Time Series Analysis and Forecasting. Wiley Series in Probability and Statistics. Wiley-Interscience [John Wiley & Sons], Hoboken, NJ.
Shumway, R. H., and D. S. Stoffer. 2011. Time Series Analysis and Its Applications. Third. Springer Texts in Statistics. Springer, New York.