What is this all about?

Many real-life signals (time series) naturally consist of cyclical variations, e.g. related to breathing/heart beating, daily/seasonal/… patterns, business cycles, etc. Moreover, many of the signals include these cyclical variations at different “frequencies” (“periods”), and often their presence is not evident from examining time plots.

It is then natural to think of signals as being superpositions of cyclical variations, across different “frequencies.” One goal is to measure the strength of these cyclical variations in a signal across different “frequencies,” with various related tasks in mind, e.g. exploratory analysis, classification, etc.

These issue are now commonly considered under the umbrella of the so-called Fourier or spectral analysis.

Data/Theory goals of the lecture

From the theoretical standpoint, the goal here is to understand and learn to think through this “spectral perspective.” From the data standpoint, the most important goal is to learn to interpret the so-called spectrum, which is a summary/numerical description of the strengths of the contributions of cyclical variations to the considered time series across a range of frequencies. For example, below is the plot of a time series and its (smoothed) spectrum (on the log scale):

# Heart rate time series from MIT depository
hr <- scan("http://ecg.mit.edu/time-series/hr.7257")

par(mfrow=c(1,2))
ts.plot(hr,xlab="Seconds")
spec.pgram(hr,spans = 25,log='yes',taper=0,pad=0,fast=FALSE,demean=TRUE,detrend=FALSE,ci = 0.95)

Sine/Cosine waves

The cyclical components will involve sine and cosine functions. A sine wave is the function \[ A \sin(wt + \psi), \] where \(w\) is frequency, \(A\) is amplitude, and \(\psi\) is phase shift. Note that \(\sin(wt + \psi) = \cos(\psi) \sin(wt) + \sin(\psi) \cos(wt)\), so that we will focus on both sine and cosine functions \[ \sin(wt),\quad \cos(wt) \] at frequency \(w\). It is often convenient to write these together under one complex exponential, that is, \[ e^{iwt} = \cos(wt) + i \sin(wt), \] where \(i\) is an imaginary unit with \(i^2=-1\); but the presentation will try to stick to the real domain.

High and low frequencies

What is the difference between large \(w\) (the so-called high frequencies) and small \(w\) (the so-called low frequencies)?

t <- 1:96 
cos1 <- cos(2*pi*t*4/96)
cos2 <- cos(2*pi*(t*14/96+.3))

plot(t,cos1, type='o', ylab='Cosines')
lines(t,cos2, lty='dotted', type='o', pch=4)

For time series indexed by \(t=1,\ldots,T\), we shall be using sines and cosines \[ \sin\Big( \frac{2\pi k}{T} t \Big),\quad \cos\Big( \frac{2\pi k}{T} t \Big),\quad k=0,1,\ldots,T/2. \] Note that, for fixed \(k\), these functions have \(k\) cycles over time \(t=1,\ldots,T\), that is, the period \(T/k\). See the plot above for illustration. They are also referred to as \(k\)th harmonics, and correspond to the (\(k\)th Fourier) frequency \[ w_k = \frac{2\pi k}{T}. \] Across \(k\), these frequencies fall in the frequency range \([0,\pi]\). Sometimes, the frequencies are taken instead as \[ w_k = \frac{k}{T} \] and then appear in \(\cos(2\pi w_k t),\sin (2\pi w_k t)\). In this case, the frequency range is \([0,1/2]\) instead of \([0,\pi]\). This is the default convention used in R.

Note: For a time series given for \(t=1,\ldots,T\), why do we focus on \(k=0,1,\ldots,T/2\)? Why not consider \(k=T\) or \(2T\)? Why not consider \(k=1/2\)?

Hertz

Instead of indexing and plotting quantities with respect to the frequency \(w\), another common unit of measurement is that of the hertz (Hz). The Hertz corresponds to 1 cycle per second and is equivalent to \(2\pi\) radiants per second. An advantage of using Hz is that it carries physical meaning.

Example 4.1: Suppose time series data is collected at \(4\) measurements per second, that is, at \(4\)Hz. Suppose the total number of data points is \(T=400\), that is, the data is collected for \(100\) seconds. We would then focus on the frequencies \[ w_k = \frac{2\pi k}{400},\quad k=0,1,\ldots, \frac{400}{2} (=200) \] or, switching to the \([0,1/2]\) range, \[ w_k = \frac{k}{400},\quad k=0,1,\ldots, \frac{400}{2} (=200). \] In terms of Hz, the frequency \(w_k\) which has \(k\) cycles in \(100\) seconds, corresponds to \[ \frac{k}{100} \] cycles per second. Thus, the Hz range would be \[ \frac{k}{100}, \quad k=0,1,\ldots, \frac{400}{2} (=200). \] Note also that the last frequency is just 2Hz (half of the sampling frequency, also known as the Nyquist frequency).

Side notes

The average adult human can hear sounds between 20 Hz and 16,000 Hz. From Wikipedia: Several animal species are able to hear frequencies well beyond the human hearing range. Some dolphins and bats, for example, can hear frequencies up to 100 kHz. Elephants can hear sounds at 14–16 Hz, while some whales can hear subsonic sounds as low as 7 Hz (in water).

And also:

Fourier representation and periodorgram/spectrum

Suppose \(T\) is even for simplicity. The functions \(\{1,\cos( \frac{2\pi k}{T} t), \sin( \frac{2\pi k}{T} t), k=1,\ldots, T/2-1, \cos(\pi t) \}\) provide an “orthogonal basis” for finite series \(x = \{x_t\}_{t=1,\ldots,T}\). In particular, any such \(x\) can be expressed as the weighted sum of these sines and cosines: \[ x_t = a_0 + \sum_{k=1}^{T/2-1}(a_k \cos( \frac{2\pi k}{T} t) + b_k \sin( \frac{2\pi k}{T} t)) + a_{T/2} \cos(\pi t), \] where \[ a_0 = \overline x,\quad a_k = \frac{2}{T}\sum_{s=1}^T x_s \cos( \frac{2\pi k}{T} s),\quad b_k = \frac{2}{T}\sum_{s=1}^T x_s \sin( \frac{2\pi k}{T} s). \] The above representation of \(x_t\) is known as its Fourier representation. The collection \(\{a_0,a_1,b_1,\ldots,a_{T/2-1},b_{T/2-1},a_{T/2}\}\) is obtained by a linear transformation of the data, with the corresponding transformation matrix being “orthogonal.” Note that these coefficients can also be interpreted as those in the “harmonic” regression of \(x_t\).

By Parseval’s Theorem, \[ \frac{T-1}{T} \mbox{var}(x_t) = \frac{1}{T} \sum_{t=1}^T (x_t - \overline x)^2 = \frac{1}{2}\sum_{k=1}^{T/2-1}(a_k^2 + b_k^2) + a_{T/2}^2. \] This corresponds to the “energy” decomposition across the frequencies.

Definition 4.1 (Periodogram/Spectrum): Let \(A_k^2 = a_k^2 + b_k^2\), \(k = 0,\ldots,T/2\). The periodogram of \(x_t\), \(t=1,\ldots,T\), is defined as \[ I(w_k) = \frac{T}{4} A_k^2,\quad k = 0,\ldots, \frac{T}{2}-1,\quad I(w_{T/2}) = T A_{T/2}^2. \] for “index” \(w_k = 2\pi k/T\) or \(w_k = k/T\). The plot of \(I(w_k)\) against \(w_k\) is also referred to as periodogram. The spectrum of \(x_t\), \(t=1,\ldots,T\), is defined as the collection of \(A_k^2\), \(k = 0,\ldots,T/2\), though sometimes the two terms are used interchangeably.

Notes

  • The definition of the periodogram/spectrum varies, up to a multiplicative constant. Here, for the periodorgram, I am following the output of the R function spec.pgram (which is the same as that of the function spectrum).
  • Some software may plot \(a_0^2\) which is \(\overline x^2\) (or its scaled value). If \(\overline x\neq 0\), this might dominate the periodogram/spectrum.
  • Note that we have \(\frac{T-1}{T} \mbox{var}(x_t) = \frac{2}{T} \sum_{k=1}^{T/2-1} I(w_k) + \frac{1}{T} I(w_{T/2})\).
  • \(I(w_k)\), \(k\neq 0\), is invariant to additive constant (in particular, invariant to the mean) since \(\sum_{s=1}^T \cos( \frac{2\pi k}{T} s) = 0\) and \(\sum_{s=1}^T \sin( \frac{2\pi k}{T} s) = 0\).
  • Note also that \[ a_k + i b_k = \frac{2}{T}\sum_{s=1}^T x_s e^{i \frac{2\pi k}{T} s},\quad k=1,\ldots,T/2-1. \] The transformation \[ \sum_{s=0}^{T-1} y_s e^{-i \frac{2\pi n}{T} s},\quad n=0,\ldots,T-1, \] is known as discrete Fourier transform (DFT) of \(y_0,y_1,\ldots,y_{T-1}\). This is a linear transformation of the data, with the corresponding transformation matrix being orthogonal. In fact, working with DFT and using it to define the periodogram is more convenient - all the “annoying” constants above result from working in real domain and using frequencies up to \(T/2\).

  • There is a fast algorithm to compute the DFT and hence the periodogram/spectrum of \(x_t\), known as Fast Fourier Transform (FFT). It is fast if \(T\) is highly composite (e.g. \(T=2^P\)). The complexity of the algorithm is then of the order \(O(T\log T)\).

Examples of periodograms/spectra of real and simulated time series

t <- 1:96 
cos1 <- cos(2*pi*t*4/96)
cos2 <- cos(2*pi*(t*14/96+.3))

x <- 2*cos1 + 3*cos2
spec.x <- spec.pgram(x,log='no',taper=0,pad=0,fast=FALSE,demean=TRUE,detrend=FALSE)

spec.x$freq
##  [1] 0.01041667 0.02083333 0.03125000 0.04166667 0.05208333 0.06250000
##  [7] 0.07291667 0.08333333 0.09375000 0.10416667 0.11458333 0.12500000
## [13] 0.13541667 0.14583333 0.15625000 0.16666667 0.17708333 0.18750000
## [19] 0.19791667 0.20833333 0.21875000 0.22916667 0.23958333 0.25000000
## [25] 0.26041667 0.27083333 0.28125000 0.29166667 0.30208333 0.31250000
## [31] 0.32291667 0.33333333 0.34375000 0.35416667 0.36458333 0.37500000
## [37] 0.38541667 0.39583333 0.40625000 0.41666667 0.42708333 0.43750000
## [43] 0.44791667 0.45833333 0.46875000 0.47916667 0.48958333 0.50000000
spec.x$spec
##  [1] 4.522242e-29 3.775028e-29 2.674616e-29 9.600000e+01 1.123616e-28
##  [6] 2.767279e-28 9.393576e-29 2.157928e-29 4.595572e-29 1.768363e-29
## [11] 6.544592e-29 7.068641e-29 3.285510e-28 2.160000e+02 2.978426e-28
## [16] 3.474275e-29 5.146592e-29 7.475617e-29 1.835696e-28 4.108651e-30
## [21] 8.567737e-29 7.410850e-31 1.372775e-28 4.297854e-29 6.543633e-29
## [26] 3.714220e-29 3.879296e-29 3.472631e-29 6.528252e-31 9.755785e-29
## [31] 3.896474e-29 3.286920e-32 1.966891e-30 1.030778e-28 1.895057e-28
## [36] 5.016657e-29 6.100931e-29 6.984706e-29 2.884852e-29 2.923651e-29
## [41] 5.411321e-29 6.354464e-29 2.705992e-29 2.659119e-29 8.482823e-29
## [46] 1.679736e-28 3.444872e-28 2.351997e-29
# check
T = length(x)
((T-1)/T)*var(x)
## [1] 6.5
(2/T)*sum(spec.x$spec[1:(T/2-1)]) + (1/T)*spec.x$spec[T/2]
## [1] 6.5
x <- rnorm(196)
par(mfrow = c(1, 2)) 
plot(x, type="l")
spec.x <- spec.pgram(x,log='no',taper=0,pad=0,fast=FALSE,demean=TRUE,detrend=FALSE)

spec.x$spec
##  [1] 1.276326309 0.795215295 1.313180053 1.630297142 0.640286001
##  [6] 0.004215409 0.607945908 0.087313106 0.172854929 3.220791944
## [11] 0.295212718 1.895732828 0.682352611 1.661080802 0.418941545
## [16] 0.126433181 1.468038598 0.338276814 2.095513830 0.122525523
## [21] 0.312259473 0.330935865 1.501659520 2.392759296 0.170956900
## [26] 0.751514850 1.173206779 1.040963047 0.113389627 0.323150515
## [31] 1.764112592 1.453935520 1.332086237 0.618294049 0.221127611
## [36] 0.725905345 0.103184018 0.505456092 3.308727326 1.415877780
## [41] 1.366643281 2.035289507 0.456046156 0.015116870 2.866615533
## [46] 1.292491342 0.526548988 0.171140993 0.203882836 0.546503247
## [51] 0.604704016 1.361116064 0.645705435 0.016553752 1.190667597
## [56] 0.090878824 1.615564970 0.464379376 0.487868295 0.285605854
## [61] 0.918200091 0.259338051 1.605557980 2.644303364 1.117050684
## [66] 1.123858487 1.223788373 0.188627797 1.592547444 1.159622199
## [71] 2.617190776 0.117174806 0.037928815 1.191715515 0.548438246
## [76] 1.355808360 1.480646439 1.416984155 0.394803702 1.423698691
## [81] 2.360097792 4.452456497 0.300854124 1.120779373 1.364505516
## [86] 1.283650511 1.474922142 0.263956047 0.022891417 0.412729217
## [91] 0.503065886 0.951870933 2.066964301 0.171083985 0.499322662
## [96] 1.366905925 0.845168337 1.936635807
T = length(x)
((T-1)/T)*var(x)
## [1] 0.9949203
(2/T)*sum(spec.x$spec[1:(T/2-1)]) + (1/T)*spec.x$spec[T/2]
## [1] 0.9949203
x <- arima.sim(list(order = c(1,0,0), ar = 0.8), n = 196)
par(mfrow = c(1, 2)) 
plot(x, type="l")
spec.x <- spec.pgram(x,log='no',taper=0,pad=0,fast=FALSE,demean=TRUE,detrend=FALSE)

x <- arima.sim(list(order = c(1,0,0), ar = -0.7), n = 196)
par(mfrow = c(1, 2)) 
plot(x, type="l")
spec.x <- spec.pgram(x,log='no',taper=0,pad=0,fast=FALSE,demean=TRUE,detrend=FALSE)

# r = 1.1, theta = pi/8 so that phi_1 = xi_1^(-1)+ xi_2^(-1) = 2*(1/r)*cos(theta), phi_2 = -xi_1^(-1)xi_2^(-1) = - 1/r^2

phi1 <- 2*(1/1.1)*cos(pi/8)
phi2 <- -1/1.1^2
x <- arima.sim(list(order = c(2,0,0), ar = c(phi1,phi2) ), n = 196)
par(mfrow = c(1, 2)) 
plot(x, type="l")
spec.x <- spec.pgram(x,log='no',taper=0,pad=0,fast=FALSE,demean=TRUE,detrend=FALSE)

To reduce variability, the spectra of real time series are usually locally smoothed, namely, the smoothed periodogram is considered, for example, for even \(q\), \[ \widetilde I(w_k) = \frac{.5 I(w_{k-q/2}) + I(w_{k-q/2+1})+\ldots + I(w_{k-1}) + I(w_k) + I(w_{k+1}) +\ldots + I(w_{k+q/2-1}) + .5 I(w_{k+q/2})}{2q}. \]

In R, this corresponds (naturally) to spans \(= 2q\) and the smoothing “kernel” with weights \((.5\ 1\ 1\ \ldots 1\ 1\ .5)/(2q)\). For example:

x <- arima.sim(list(order = c(1,0,0), ar = -0.7), n = 196)
par(mfrow = c(1, 2)) 
plot(x, type="l")
spec.x <- spec.pgram(x,spans = 20, log='no',taper=0,pad=0,fast=FALSE,demean=TRUE,detrend=FALSE)

The smoothing “kernel” used above is:

plot(kernel("modified.daniell", 10),ylim=c(0,0.05))  

Other smoothing “kernels” can be applied as well.

plot(kernel("modified.daniell", c(5,5)))  # m=5 moving average of a m=5 moving average

Spectra of heart rate variability series

The following are heart rate variability series from MIT-BIH Database Distribution Home Page. These are measured in BPM, and are typically obtained from the so-called RR (inter-beat; R’s from a QRS complex) interval series (for example, by counting the beats in a six-second interval and multiplying by ten), which itself is derived from e.g. ECG.

Each series contains 1800 evenly-spaced measurements of instantaneous heart rate from a single subject. The two subjects were engaged in comparable activities for the duration of each series. The measurements (in units of beats per minute) occur at 0.5 second intervals, so that the length of each series is exactly 15 minutes.

hr1 <- scan("http://ecg.mit.edu/time-series/hr.11839")
hr2 <- scan("http://ecg.mit.edu/time-series/hr.7257")
#hr1 <- ts(hr1,frequency=2)
#hr2 <- ts(hr2,deltat=0.5)

par(mfrow=c(2,1), cex=0.7, bty="L", mar=c(2,2,2,0.5))
ts.plot(hr1,main="Time series plots",xlab="")
ts.plot(hr2,xlab="Seconds")

par(mfrow=c(2,1), cex=0.7, bty="L", mar=c(8,2,2,2))
spec1 <- spec.pgram(hr1,spans = 15, log='no',taper=0,pad=0,fast=FALSE,demean=TRUE,detrend=FALSE,xaxt = "n", xlab = "frequency (Hz)")
axis(side = 1, at = (0:25)/50, labels = 2*(0:25)/50)

spec2 <- spec.pgram(hr2,spans = 15, log='no',taper=0,pad=0,fast=FALSE,demean=TRUE,detrend=FALSE,xaxt = "n", xlab = "frequency (Hz)")
axis(side = 1, at = (0:25)/50, labels = 2*(0:25)/50)

Note: How is the frequency axis translated to Hz?

From the site above: The rapid oscillations visible in series 1 are reflected in its power spectrum by a peak near 0.1 Hz (most likely, the frequency of respiration in this subject); thus this component of HRV is probably respiratory sinus arrhythmia, a modulation of heart rate that is greatest in young subjects, and gradually decreases in amplitude with increasing age. By contrast, almost all of the power in series 2 is concentrated at a much lower frequency (about 0.02 Hz). These dynamics are commonly observed in the context of congestive heart failure, where circulatory delays interfere with regulation of carbon dioxide and oxygen in the blood, leading to slow oscillations of heart rate.

Qualitative spectrum of heart rate variability series:

Spectrum of (signifcant) wave height

(E.g. measured through buoys in oceans)

Spectrum of a time series model

One can show that \[ I(w_k) = \sum_{h=-(T-1)}^{T-1} e^{iw_kh} \widehat \gamma_X(h) = \sum_{h=-(T-1)}^{T-1} \cos(w_kh) \widehat \gamma_X(h) . \] This suggests that for large \(T\) and stationary series, the periodogram estimates \[ \sum_{h=-\infty}^\infty \cos(w h) \gamma_X(h) = \sum_{h=-\infty}^\infty e^{iw h} \gamma_X(h) =: (2\pi) f_X(w). \]

Definition 4.2 (Spectral density): The quantity \(f_X(w)\), \(w\in[0,2\pi)\), that is, \[ f_X(w) = \frac{1}{2\pi} \sum_{h=-\infty}^\infty e^{iw h} \gamma_X(h), \] is called the spectral density of the stationary time series \(X\) with ACVF \(\gamma_X\).

Notes

  • One can think of a time series model \(\{X_t\}\) as given by \[ X_t \approx \sum_j e^{itw_j} f_X(w_j)^{1/2} (w_{j+1} - w_j)^{1/2} \widetilde Z_j, \] where \(\{w_j\}\) make a fine partition of \([0,2\pi)\) and \(\widetilde Z_j\) are suitable complex-valued “uncorrelated” variables.
  • The definition above provides an expression for \(f_X(w)\) in terms of \(\gamma_X(h)\). Conversely, \[ \gamma_X(h) = \int_0^{2\pi} e^{-ihw} f_X(w) dw. \]
  • Note that \(f_X(w) = f_X(2\pi-w)\), so that usually \(f_X(w)\) is considered on \([0,\pi)\) only.

Example 4.2 (Spectral density of WN series): Recall that for WN\((0,\sigma_Z^2)\) series \(\{Z_t\}\), \[ \gamma_Z(h) = \left\{ \begin{array}{cl} EZ_t^2 = \sigma^2_Z, & h=0,\\ 0, & h\neq 0. \end{array} \right. \] Thus, its spectral density is \[ f_Z(w) = \frac{\sigma^2_Z}{2\pi}, \quad w\in [0,2\pi). \]

x <- rnorm(196)
par(mfrow = c(1, 2)) 
plot(x, type="l")
spec.x <- spec.pgram(x,log='no',taper=0,pad=0,fast=FALSE,demean=TRUE,detrend=FALSE)

x.freq <- spec.x$freq
fx.freq <- rep(1,length(x.freq))
points (x.freq, fx.freq, type="l", col="red",lwd=2)

Example 4.2 (Spectral density of AR(1) series): Recall that for AR\((1)\) series \(\{X_t\}\) satisfying \(X_t = \phi_1 X_{t-1} + Z_t\) with \(\{Z_t\} \sim {\rm WN}(0,\sigma^2_Z)\), \[ \gamma_X(h) = \frac{\sigma^2_Z \phi_1^{|h|}}{1 - \phi_1^2} \] and thus it can be shown that \[ f_X(w) = \frac{\sigma^2_Z}{2\pi |1-\phi_1 e^{-i w}|^2},\quad w\in [0,2\pi). \]

x <- arima.sim(list(order = c(1,0,0), ar = 0.8), n = 196)
par(mfrow = c(1, 2)) 
plot(x, type="l")
spec.x <- spec.pgram(x,log='no',taper=0,pad=0,fast=FALSE,demean=TRUE,detrend=FALSE)


x.freq <- spec.x$freq
fx.freq <- 1/abs(1-0.8*exp(1i*2*pi*x.freq))^2
points (x.freq, fx.freq, type="l", col="red",lwd=2)

Linear filtering

In fact, the spectral density of AR\((1)\) series could have been gotten easily by using the following very useful result related to linear filtering. Suppose \(\{Y_t\}\) is obtained from the series \(\{X_t\}\) by filtering linearly through a filter \(b = (b_j)\), that is, \[ Y_t = \sum_{j=-\infty}^\infty b_j X_{t-j}. \] Then, if \(\{X_t\}\) has the spectral density \(f_X(w)\), the spectral density of \(\{Y_t\}\) is given by \[ f_Y(w) = \Big| \sum_j b_j e^{-ijw} \Big|^2 f_X(w). \] The quantity \(\sum_j b_j e^{-ijw}\) is known as the Fourier transform of \(b = (b_j)\).

Example 4.3 (Spectral density of AR(1) series, cont’ed): AR\((1)\) series \(\{X_t\}\) satisfies \(X_t - \phi_1 X_{t-1} = Z_t\) with \(\{Z_t\} \sim {\rm WN}(0,\sigma^2_Z)\). Applying the result above, we get that \[ |1-\phi_1 e^{-i w}|^2 f_X(w) = f_Z(w) = \frac{\sigma^2_Z}{2\pi} \] or \[ f_X(w) = \frac{\sigma^2_Z}{2\pi} \frac{1}{|1-\phi_1 e^{-i w}|^2}. \]

Example 4.4 (Spectral density of AR(p) series): AR\((p)\) series \(\{X_t\}\) satisfies \(X_t - \phi_1 X_{t-1} -\ldots - \phi_p X_{t-p}= Z_t\) with \(\{Z_t\} \sim {\rm WN}(0,\sigma^2_Z)\). Applying the argument above, we get that \[ f_X(w) = \frac{\sigma^2_Z}{2\pi} \frac{1}{|1-\phi_1 e^{-i w} - \ldots - \phi_p e^{-ipw} |^2}. \]

# r = 1.1, theta = pi/8 so that phi_1 = xi_1^(-1)+ xi_2^(-1) = 2*(1/r)*cos(theta), phi_2 = -xi_1^(-1)xi_2^(-1) = - 1/r^2

phi1 <- 2*(1/1.1)*cos(pi/8)
phi2 <- -1/1.1^2
x <- arima.sim(list(order = c(2,0,0), ar = c(phi1,phi2) ), n = 196)
par(mfrow = c(1, 2)) 
plot(x, type="l")
spec.x <- spec.pgram(x,log='no',taper=0,pad=0,fast=FALSE,demean=TRUE,detrend=FALSE)



x.freq <- spec.x$freq
fx.freq <- 1/abs(1-phi1*exp(1i*2*pi*x.freq)-phi2*exp(1i*2*2*pi*x.freq))^2
points (x.freq, fx.freq, type="l", col="red",lwd=2)

High-pass/Low-pass filters

This also explains how various filters affect the series they are applied to. E.g. the differencing operation \(X_t - X_{t-1}\) corresponds to linear filtering with the filter \(b_0 = 1, b_1 = -1\) and hence its multiplicative effect on the spectral density (spectrum) is through \[ |1 - e^{iw}|^2 = 4 \sin^2(w/2), \] which is depicted in the following plot. Thus filtering through \(b\) will attenuate low frequencies (keep high frequencies) - such \(b\) is called high-pass.

curve(abs(1-exp(1i*x))^2,0,pi)

Similarly, one would expect the opposite effect from moving average/smoothing \(X_t + X_{t-1}\), which corresponds to the linear filtering with the filter \(b_0 = 1, b_1 = 1\). Indeed, its multiplicative effect on the spectral density (spectrum) is through \[ |1 + e^{iw}|^2 = 4 \cos^2(w/2), \] which is depicted in the following plot. Thus filtering through \(b\) will attenuate high frequencies (keep low frequencies) - such \(b\) is called low-pass.

curve(abs(1+exp(1i*x))^2,0,pi)

Estimation of the spectral density/Properties of the DFT of data

Suppose given time series data \(x_t\), \(t=1,\ldots,T\), which is viewed as a “realization” of stationary time series model. Let \(w_k = \frac{2\pi k}{T}\) as before. Recall from above that the DFT of the data are the coefficients: \[ a_k = \frac{2}{T}\sum_{s=1}^T x_s \cos( w_k s),\quad b_k = \frac{2}{T}\sum_{s=1}^T x_s \sin( w_k s), \quad k=0,\ldots,\frac{T}{2}. \] The periodogram is defined as \[ I(w_k) = \frac{T}{4}(a_k^2 + b_k^2). \] What are the properties of the DFT coefficients \(a_k,b_k\) and the periodogram \(I(w_k)\)? As noted above, \(I(w_k)\) is expected to approximate \((2\pi)f_X(w)\) as \(w_k\to w\) and \(T\to\infty\). But what kind of an approximation is it? What about approximation across different \(w_k\)’s? What about \(a_k,b_k\)?

Example: Note that \[ \left( \begin{array}{c} a_0\\ a_1\\ b_1\\ \vdots\\ a_{T/2-1}\\ b_{T/2-1}\\ a_{T/2} \end{array} \right) = F \left( \begin{array}{c} x_1\\ x_2 \\ x_3\\ \vdots\\ x_{T-2}\\ x_{T-1}\\ x_{T} \end{array} \right) \] where \(F\) is orthogonal. So, for example, if \(x_t\), \(t=1,\ldots,T\), are IID normal variables, its DFT \(a_0, a_1, b_1,\ldots, a_{T/2-1}, b_{T/2-1}, a_{T/2}\) will be independent normal (ID for \(k\neq 0,T/2\)). In fact, the variance of \(a_k\) or \(b_k\) (\(k\neq 0,T/2\)) will be \(\frac{2}{T}Ex_t^2\). Then, \[ a_k^2 + b_k^2 \sim \Big(\frac{2}{T}Ex_t^2 \Big) \chi^2_2 = \Big(\frac{4}{T}Ex_t^2 \Big) \mbox{Exp}(1), \] where \(\chi^2_2\) is the chi-square distribution with \(2\) degree of freedom, and \(\mbox{Exp}(1)\) is the exponential distribution with parameter \(1\). Note, in particular, that with DFT, \(T\) data points are transformed into \(T\) data points, and hence variability is expected as in the periodograms above.

General known facts: What about a temporally dependent time series \(x_t\), \(t=1,\ldots,T\), generated according to a stationary time series model \(\{X_t\}\) having ACVF \(\gamma_X(h)\) and spectral density \(f_X(w)\)? Under “mild” assumptions, one can show that as \(w_k\to w\) and \(T\to\infty\) (\(k\neq 0,T/2\)), \[ \sqrt{T} a_k \approx \sqrt{2(2\pi) f_X(w)} {\cal N}(0,1),\quad \sqrt{T} b_k \approx \sqrt{2(2\pi)f_X(w)} {\cal N}(0,1), \] and \(a_k\), \(b_k\) are asymptotically independent. In particular, \[ I(w_k) = \frac{T}{4}(a_k^2 + b_k^2) \approx \frac{(2\pi) f_X(w)}{2} \chi^2_2 = (2\pi) f_X(w)\, \mbox{Exp}(1). \] That is, \(I(w_k)\) approximates \((2\pi)f_X(w)\) on average only, with the fluctuations around the mean following an exponential distribution. Note also that the variance is larger for larger \(f_X(w)\).

Decorrelation property: In addition to above, it is also known under “mild” assumptions that \(a_k\), \(b_k\) and \(I(w_k)\) are asymptotically independent at different frequencies \(w\). This independence property makes spectral analysis particularly appealing from the statistical standpoint. See below for some example.

Question: why decorrelation?

The asymptotic results above show that the periodogram itself is not a consistent estimator for \((2\pi)f_X(w)\), and that it is natural to consider its smoothed version for a consistent estimator: \[ \widehat f(w_k) = \frac{1}{2\pi} \sum_{|j|\leq m_T} W_T(j) I(w_{j+k}), \] where \(m_T\to\infty\), \(m_T/T\to 0\), the weights \(W_T(j)\) satisfy \[ W_T(j) = W_T(-j),\quad \sum_{|j|\leq m_T} W_T(j) = 1,\quad \sum_{|j|\leq m_T} W_T(j)^2 \to 0. \] For arbitrary \(w\in (0,\pi)\), \(\widehat f(w)\) is defined as \(\widehat f(w_k)\) for \(w_k\) closest to \(w\).

Example 4.5: As an example of weights, one can take Daniell weights as used above (after simplifying the weights at the two ends) \[ W_T(j) = \frac{1}{2m_T+1},\quad |j|\leq m_T. \] (Convolution of these are also used, as well as other weights.)

For many time series model, one can show that (\(w,\lambda\in(0,\pi)\)):

  1. Asymptotic unbiasedness: \(\lim_{T\to\infty} E \widehat f(w) = f_X(w)\).

  2. Asymptotic decorrelation and consistency: \(\lim_{T\to\infty} (\sum_{|j|\leq m_T} W_T(j)^2)^{-1} \mbox{Cov}(\widehat f(w), \widehat f(\lambda)) = \left\{ \begin{array}{cl} (f_X(w))^2, & w = \lambda, \\ 0, & w\neq \lambda. \end{array} \right.\)

This may be used to construct confidence intervals.

Example 4.6: For the Daniell weights \(W_T(j) = \frac{1}{2m_T+1}\), \(|j|\leq m_T\), we have \(\sum_{|j|\leq m_T} W_T(j)^2 = \frac{1}{2m_T+1}\) and hence that \[ \mbox{Var}(\widehat f(w)) \approx \frac{(f_X(w))^2}{2m_T+1}. \]

Bias-variance tradeoff: The choice of “bandwidth” \(m_T\) (or rather \((2m_T+1)/T\)) is related to the so-called bias-variance trade-off. The larger \(m_T\) reduces the variance but increases the bias. And vice versa, the smaller \(m_T\) reduces the bias but increases the variance. The optimal balance in the bias-variance trade-off leads to the so-called optimal “bandwidths”. E.g. for the Daniell weight, one can show that \[ \mbox{Bias}(\widehat f(w)) \approx C \frac{m_T^2}{T^2} f_X''(w) \] which together with the expression for asymptotic variance suggests that \[ m_{T,opt} = \Big( \frac{f_X(w)}{Cf_X''(w)}\Big)^{2/5} \frac{T^{4/5}}{8^{1/5}}. \] Note the presence of unknown spectral density \(f_X(w)\), which would have to be replaced by an estimate. Ad-hoc “bandwidths” have also been suggested.

Note: The so-called Lag Window estimators of the spectral density are defined as \[ \widehat f_L(w) = \frac{1}{2\pi} \sum_{|h|\leq r} W\Big( \frac{h}{r} \Big) \widehat \gamma_X(h) e^{-ihw}, \] where \(W(\cdot)\) is called the lag window function and satisfies: it is even, piecewise continuous, \(W(0)=1\), \(|W(x)|\leq 1\) all \(x\), \(W(x)=0\) for \(|x|>1\). One can show that the Lag Window estimator is approximately the smoothed periodogram with the weights \[ W_T(j) = \frac{\widehat W(w_j)2\pi}{T},\quad |j|\leq T/2, \] where \(\widehat W(w) = \frac{1}{2\pi} \sum_{|h|\leq r} W(\frac{h}{r}) e^{-ihw}\) is called a “spectral window.” Lots of different windows are used (rectangular, Bartlett or triangular, Daniell, etc).

Data example

The nino3 sea surface temperature data: the monthly averages of sea surface temperatures over the East Equatorial Pacific. (SSA: Anomaly; monthly means of entire record removed.)

nino <- read.table("nino3data.asc", skip = 3)
names(nino) <- c("Year", "SST", "SSA")

par(mfrow = c(1, 2))
plot(nino$Year, nino$SST, type = "l")
acf(nino$SST, lag.max = 12 * 20)

par(mfrow = c(1, 2))
plot(nino$Year, nino$SSA, type = "l")
acf(nino$SSA, lag.max = 12 * 20)

raw.specSST <- spec.pgram(nino$SST,log = "no",taper = 0,pad=0,fast=FALSE,demean=TRUE,detrend=FALSE)
raw.specSSA <- spec.pgram(nino$SSA,log = "no",taper = 0,pad=0,fast=FALSE,demean=TRUE,detrend=FALSE)

k = kernel("daniell", c(9, 9, 9))
par(mfrow = c(1, 2))
plot(k)
smooth.spec <- spec.pgram(nino$SST,kernel = k,log = "no",taper =0,pad=0,fast=FALSE,demean=TRUE,detrend=FALSE)

library(ggplot2)
library(reshape2)
library(gridExtra)

# Create a vector of periods to label on the graph, units are in years
yrs.period <- rev(c(1/6, 1/5, 1/4, 1/3, 0.5, 1, 3, 5, 10, 100))
yrs.labels <- rev(c("1/6", "1/5", "1/4", "1/3", "1/2", "1", "3", "5", "10", 
    "100"))
yrs.freqs <- 1/yrs.period * 1/12  #Convert annual period to annual freq, and then to monthly freq


k = kernel("daniell", c(9, 9, 9))
smooth.spec <- spec.pgram(nino$SST, kernel = k, taper = 0, plot = FALSE)
spec.df <- data.frame(freq = smooth.spec$freq, `c(9,9,9)` = smooth.spec$spec)
names(spec.df) <- c("freq", "c(9,9,9)")
# Add other smooths
k <- kernel("daniell", c(9, 9))
spec.df[, "c(9,9)"] <- spec.pgram(nino$SST, kernel = k, taper = 0, plot = FALSE)$spec
k <- kernel("daniell", c(9))
spec.df[, "c(9)"] <- spec.pgram(nino$SST, kernel = k, taper = 0, plot = FALSE)$spec


# melt from wide format into long format
spec.df <- melt(spec.df, id.vars = "freq", value.name = "spec", variable.name = "kernel")
plot1 <- ggplot(data = subset(spec.df)) + geom_path(aes(x = freq, y = spec, 
    color = kernel)) + scale_x_continuous("Period (years)", breaks = yrs.freqs, 
    labels = yrs.labels) + scale_y_log10()

plot2 <- ggplot(data = subset(spec.df)) + geom_path(aes(x = freq, y = spec, 
    color = kernel)) + scale_x_log10("Period (years)", breaks = yrs.freqs, labels = yrs.labels) + 
    scale_y_log10()

grid.arrange(plot1, plot2)

Finer details (from Cowpertwait and Metcalfe, Ch. 9)

  • Padding: The simplest FFT algorithm assumes that the time series has a length that is some power of 2. A positive integer is highly composite if it has more divisors than any smaller positive integer. The FFT algorithm is most efficient when the length \(T\) is highly composite, and by default \(\verb!spec.pgram!\) pads the mean adjusted time series with zeros to reach the smallest highly composite number that is greater than or equal to the length of the time series. Padding can be avoided by setting the parameter \(\verb!fast=FALSE!\). A justification for padding is that the length of the time series is arbitrary and that adding zeros has no effect on the frequency composition. Adding zeros does reduce the variance, and this must be remembered when scaling the spectrum, so that its area equals the variance of the original time series.

  • Tapering: The length of a time series is not usually related to any underlying frequency composition. However, the discrete Fourier series keeps replicating the original time series as \(-\infty < t < \infty\), known as periodic extension of the original time series, and there will usually be a jump between the end of one replicate time series and the start of the next. These jumps can be avoided by reducing the magnitude of the values of the time series, relative to its mean, at the beginning and towards the end. The default with \(\verb!spec.pgram!\) is a taper applied to 10% of the data at the beginning and towards the end of the time series. Tapering increases the variance of Fourier line spectrum spikes but reduces the bias. It will also reduce the variance of the time series. The default proportion of data to which the taper is applied can be changed with the parameter taper.

Use of DFT/Spectrum beyond looking for “hidden periodicities”

The nice asymptotic properties of DFT coefficients and the periodorgram, combined with their non-parametric nature, are quite appealing from the statistical standpoint. As a result, various time series analysis tasks are often performed in the spectral domain.

Anything you can do in the time domain, I can do in the spectral domain (better?) The original song.

MLE in the spectral domain

Suppose \(\{X_t\}\) is Gaussian with mean zero, ACVF \(\gamma(\theta;h) = E X_0X_h\) and spectral density \(f(\theta;w) = \frac{1}{2\pi} \sum_{h=-\infty}^\infty \gamma(\theta;h) e^{ihw}\), where \(\theta\) consists of model parameters. The Gaussian MLE maximizes the likelihood \[ \frac{1}{(2\pi)^{T/2}(\mbox{det}(\Gamma_T(\theta)))^{1/2}} e^{-\frac{1}{2}\mathbf{X}_T \Gamma_T(\theta))^{-1}\mathbf{X}_T} \] where \(\Gamma_T(\theta) = (\gamma(\theta;j-k))_{j,k=1,\ldots,T}\) and \(\mathbf{X}_T = (X_1,\ldots,X_T)'\), or minimizes \[ \frac{1}{T} \log \mbox{det}(\Gamma_T(\theta)) + \frac{1}{T} \mathbf{X}_T \Gamma_T(\theta))^{-1}\mathbf{X}_T. \] One can show that \[ \frac{1}{T} \log \mbox{det}(\Gamma_T(\theta)) \approx \frac{1}{2\pi} \int_0^{2\pi} \log f(\theta;w) dw \] (this is known as Szego’s weak limit theorem) and \[ \frac{1}{T} \mathbf{X}_T \Gamma_T(\theta))^{-1}\mathbf{X}_T \approx \frac{1}{2\pi}\int_0^{2\pi} \frac{I(w)}{f(\theta;w)}dw. \] (The constants may be wrong.) The MLE in the spectral domain then minimizes \[ \sum_{j=1}^T \log f(\theta;w_j) + \frac{I(w_j)}{ f(\theta;w_j)}. \]

This is useful when, e.g., the spectral densities (and not ACVFs) have an explicit parametric form (as in ARMA\((p,q)\) family). The asymptotics is the same as that of the time-domain MLE estimators.

Testing for same (or given dependence) structure

Suppose given two independently observed time series \(x_t\) and \(y_t\), \(t=1,\ldots,T\), which look stationary or that you would model through stationary models \(\{X_t\}\) and \(\{Y_t\}\), respectively. Suppose you are interested in a formal statistical test to decide if the underlying stationary time series models have the same dependence structure, that is, if \(\rho_X \equiv \rho_Y\) or equivalently (supposing the models and series are standardized) \(f_X\equiv f_Y\). Or depending on the application, testing if \(\gamma_X \equiv \gamma_Y\) or equivalently \(f_X\equiv f_Y\).

Data example: The following are two (preprocessed) fMRI signals from the same brain ROI (region of interest) for the same individual in two different emotional states.

x1 <- read.csv("fmri_1.csv",header = FALSE)
x2 <- read.csv("fmri_2.csv",header = FALSE)

x1 <- x1[,1]/sd(x1[,1])
x2 <- x2[,1]/sd(x2[,1])

par(mfrow=c(1,2))
plot.ts(x1)
plot.ts(x2)

par(mfrow=c(1,2))
acf(x1,lag.max = 20)
acf(x2,lag.max = 20)

One possibility is to try to do this in the time domain by comparing formally the sample ACVFs \(\widehat \rho_X(h)\) and \(\widehat \rho_Y(h)\) for a range of lags \(h=1,\ldots,M\). For a formal statistical test, we need their asymptotic distributions as e.g. in (see Brockwell and Davis (2006)):

That is, we expect that \[ \sqrt{T} \left( \begin{array}{c} \widehat \rho_X(1) - \rho_X(1)\\ \vdots\\ \widehat \rho_X(M) - \rho_X(M) \end{array} \right) \approx {\cal N}(0,W_X), \] where \(W_X\) is given by the matrix in the result above. The test statistic can then be of the Wald-type defined as \[ \widehat S := T \widehat{\mathbf{A}}' \widehat V^{-1} \widehat{\mathbf{A}}, \] where \(\widehat{\mathbf{A}}' = (\widehat \rho_X(1) - \widehat \rho_Y(1),\ldots,\widehat \rho_X(M)-\widehat \rho_Y(M))\) and \(\widehat V\) estimates the covariance matrix \(W_X + W_Y\) of \(\widehat{\mathbf{A}}\). Under the null \(H_0: \rho_X\equiv \rho_Y\), \[ \widehat S\approx \chi^2_{M}. \]

For the sample ACVF, the asymptotic result can be read from:

This leads to the Wald test implemented as follows (in the Gaussian case, when \(\eta=3\); see e.g. Lund et al. (2009)):

y1 <- read.csv("fmri_1.csv",header = FALSE)
y2 <- read.csv("fmri_2.csv",header = FALSE)

y1 <- y1[,1]
y2 <- y2[,1]

dep_str_test0 <- function(x1, x2, L){
  n <- length(x1)
  if(missing(L)){ L = floor(n^(1/3)); }
 
  K <- 30
  py1 <- acf(x1, lag.max=K+L, type = "covariance", plot=FALSE)$acf
  py2 <- acf(x2, lag.max=K+L, type = "covariance", plot=FALSE)$acf
  p1 <-  py1 - py2
  py <-  as.vector((py1 + py2)/2)
  
## Calculate W matrix
  W <- matrix(0, L+1, L+1)
  for(i in 0:L){
    for(j in 0:L){
       idk <- -K:K
       id1 <- idk-i+j
       id2 <- idk + j
       id3 <- idk - i
       idk <- abs(idk)+1
       id1 <- abs(id1)+1 
       id2 <- abs(id2)+1 
       id3 <- abs(id3)+1
       W[i+1, j+1]  <- sum(py[idk]*py[id1] + py[id2]*py[id3]);
    }
  }
  
  p1 <- p1[1:(L+1)]
  tstat <- n/2*t(p1)%*%solve(W)%*%p1
  pval <- 1- pchisq(tstat, L+1)
return(list(tstat=tstat, pval=pval))
}

dep_str_test0(y1,y2,5)
## $tstat
##          [,1]
## [1,] 41.64508
## 
## $pval
##              [,1]
## [1,] 2.160612e-07

Testing can also be carried out easier and also in a variety of ways in the spectral domain by using the results discussed above (see e.g. Lund et al. (2009)). One test can be based on the log-logistic distribution:

dep_str_test1 <- function(x1, x2){

  n <- length(x1)
  
  sp1 <-  spec.pgram(x1, log='no',taper=0,pad=0,fast=FALSE,demean=TRUE,detrend=FALSE)
  sp2 <-  spec.pgram(x2, log='no',taper=0,pad=0,fast=FALSE,demean=TRUE,detrend=FALSE)
  
  D <- log(sp1$spec) - log(sp2$spec)
  tstat <- mean(abs(D[-length(D)]))
  crit <- log(4) + 1.96*sqrt(1.368/(length(D)-1))

return(list(tstat=tstat, crit=crit))  
}

dep_str_test1(x1,x2)

## $tstat
## [1] 1.404832
## 
## $crit
## [1] 1.652786
# side note: smoothing does not affect the conclusion
# dep_str_test <- function(x1, x2, m){
# 
#   n <- length(x1)
#   if(missing(m)){ m <- 5}
#   
#   sp1 <-  spec.pgram(x1, spans=2*m+1, log='no',taper=0,pad=0,fast=FALSE,demean=TRUE,detrend=FALSE)
#   sp2 <-  spec.pgram(x2, spans=2*m+1, log='no',taper=0,pad=0,fast=FALSE,demean=TRUE,detrend=FALSE)
#   
#   D <- log(sp1$spec) - log(sp2$spec)
#   tstat <- mean(abs(D))
#   crit <- log(4) + 1.96*sqrt(1.368/(length(D)-1))
# 
# return(list(tstat=tstat, crit=crit))  
# }
# 
# dep_str_test(x1,x2,5)

The next test is based on the likelihood ratio:

dep_str_test2 <- function(x1, x2){

  n <- length(x1)
  
  sp1 <-  spec.pgram(x1, log='no',taper=0,pad=0,fast=FALSE,demean=TRUE,detrend=FALSE,plot=FALSE)
  sp2 <-  spec.pgram(x2, log='no',taper=0,pad=0,fast=FALSE,demean=TRUE,detrend=FALSE,plot=FALSE)
  
  L <- log(2*(sp1$spec*sp2$spec)/(sp1$spec+sp2$spec)^2)
  tstat <- sum(L[-length(L)])
  crit <- (length(L)-1) * (-0.614 -  1.96*sqrt(0.759/(length(L)-1)))  

return(list(tstat=tstat, crit=crit))  
}

dep_str_test2(x1,x2)
## $tstat
## [1] -88.23176
## 
## $crit
## [1] -60.12502

As shown in Lund et al. (2009) through a simulation study, the above spectral-based tests have poor power (for lower order AR and MA models). For example:

Note: Why the power is so poor? Should one smooth and use fewer non-overlapping blocks?

Some final notes

  • Most of the lecture is based on Cowpertwait and Metcalfe (2009), Ch. 9; Cryer and Chan (2008), Ch. 13; Shumway and Stoffer (2011), Chs. 4,7; which are introductory textbooks on time series analysis. Several books focusing on spectral analysis are: Priestley (1981a)(1981b) (combined ~900 pages), Bloomfield (2000).

Questions/Homework

Q0: Supplement your analysis of time series data from Q1, Lecture I, by spectral analysis if seems appropriate.

Q1: Verify that the DFT indeed provides an orthogonal transformation of the data. What is the inverse of the transformation? Also, show that \[ \frac{1}{T} \sum_{t=1}^T (x_t - \overline x)^2 = \frac{1}{2}\sum_{k=1}^{T/2-1}(a_k^2 + b_k^2) + a_{T/2}^2. \]

Q2: Explain the idea behind FFT in some detail.

Q3: Establish the basic properties of the DFT discussed above, namely, the basic asymptotic results at fixed frequency and the decorrelation property.

Q4: Derive the optimal bandwidth for the smoothed periodogram. What other ad-hoc bandwidths are used?

Q5: Explain why the Lag Window estimators of the spectral density are approximately smoothed periodograms.

Q6: Justify the approximations given when going from MLE in the time domain to MLE in the spectral domain.

References

Bloomfield, P. 2000. Fourier Analysis of Time Series. Second. Wiley Series in Probability and Statistics: Applied Probability and Statistics. Wiley-Interscience [John Wiley & Sons], New York.

Brockwell, P. J., and R. A. Davis. 2006. Time Series: Theory and Methods. Springer Series in Statistics. Springer, New York.

Cowpertwait, P. S. P., and A. V. Metcalfe. 2009. Introductory Time Series with R. Springer.

Cryer, J. D., and K.-S. Chan. 2008. Time Series Analysis with Applications in R. Springer.

Lund, R., H. Bassily, and B. Vidakovic. 2009. “Testing Equality of Stationary Autocovariances.” Journal of Time Series Analysis 30 (3): 332–48.

Priestley, M. B. 1981a. Spectral Analysis and Time Series. Vol. 1. Academic Press, Inc. [Harcourt Brace Jovanovich, Publishers], London-New York.

———. 1981b. Spectral Analysis and Time Series. Vol. 2. Academic Press, Inc. [Harcourt Brace Jovanovich, Publishers], London-New York.

Shumway, R. H., and D. S. Stoffer. 2011. Time Series Analysis and Its Applications. Third. Springer Texts in Statistics. Springer, New York.