Anomaly/Outlier Detection

What is this all about?

It is often useful in time series data to look for and detect different kinds of anomalous or outlying observations.

It is difficult to rigorously define the term outlier, but the basic idea is an observation that “deviates markedly” from other observations (Hodge and Austin 2004), in particular enough to “arouse suspicions that it was generated by a different mechanism” (Hawkins 1980; Aggarwal 2017). Outliers are sometimes called by other names such as “novelty”, “deviation” or “discord”.

There are two closely related but subtly distinct goals of outlier detection:

To find observations that arise in error, perhaps due to faulty data collection or unintended system changes. In this case, the outliers are not relevant to analysis.
To find observations that are anomalous but meaningful in some way. For instance, a small but unexpected change in an electrocardiogram (ECG) time series may be indicative of a medical condition.

Outliers vs. Change points

We will consider outliers that have only an instantaneous effect, as well as those whose effect decays over time. In the extreme case, if the effect of the outlier is sustained for the entire series (or an extended portion of the series), it is preferred to consider it a change point instead. In this lecture, we will focus on outliers which are not change points, but there will be some discussion of change points where relevant.

Data for this lecture

We will primarily focus on ECG data. We will also consider simulated data from several ARIMA models with outliers of various types intentionally added. See below for the different types of outliers.

Additionally, we will make use of several R packages:

library(forecast)
library(tsoutliers)
library(jmotif)
library(arules)
library(permute)
library(readr)

Motivating Example 1: ECG data

We use several time series of electrocardiogram (ECG or EKG) data, which measures the electrical activity in the heart. For a more detailed analysis of these series, see Keogh, Lin and Fu (2005). In these types of series, it may or may not be easy to tell the outlier by eye, but we would like to have an automated procedure and compare the performance of the two algorithms discussed below.

ECG1 <- read_delim("./ECG_data/stdb_308_0.txt", 
    "\t", escape_double = FALSE, col_names = FALSE, 
    trim_ws = TRUE)
ECG2 <- read_delim("./ECG_data/xmitdb_x108_0.txt", 
    "\t", escape_double = FALSE, col_names = FALSE, 
    trim_ws = TRUE)
ECG3 <- read_delim("./ECG_data/mitdb__100_180.txt", 
    "\t", escape_double = FALSE, col_names = FALSE, 
    trim_ws = TRUE)

ecg1 <- ECG1$X2 
ecg2 <- ECG2$X2 
ecg3 <- ECG3$X2 

 
ts.plot(ecg1)

ts.plot(ecg2)

ts.plot(ecg3)

Motivating Example 2: Simulated data

We will begin with several basic ARIMA models, to which we will later intentionally add outliers.

## Several basic stationary and nonstationary time series models
set.seed(12345)

ts1 <- arima.sim(list(order = c(1,0,0), ar = 0.7), n = 500)
ts2 <- arima.sim(list(order = c(2,0,0), ar = c(2*(1/1.1)*cos(pi/8),-1/1.1^2) ), n = 500)
ts3 <- arima.sim(list(order = c(0,1,0)), n = 499)
ts4 <- arima.sim(list(order = c(0,0,0)), n = 500)

par(mfrow=c(2,2))
ts.plot(ts1,main="ts1: AR1")
ts.plot(ts2,main="ts2: AR2")
ts.plot(ts3,main="ts3: Random walk")
ts.plot(ts4,main="ts4: White noise")

Approaches to Outlier Detection 1: ARIMA model-based

In this lecture, we take two approaches: the first is described by Chen and Liu (1993) and can be applied to ARIMA processes only. It can be thought of as a model-based approach which attempts to estimate and adjust the effect of an outlier.

The second approach is from Keogh et al. (2005) and known as Heuristically Ordered Time series using Symbolic Aggregate ApproXimation, or HOT SAX. This is a heuristic approach which attempts to find subsequences of observations which are maximally different to all other subsequences within a given time series.

Setting for first approach

Consider a general ARIMA process (with zero mean):

\[ X_t = \dfrac{\theta(B)}{\alpha(B)\phi(B)}Z_t,\quad t = 1,...,T,\] where the roots of \(\phi(B)\) (AR polynomial) and \(\theta(B)\) (MA polynomial) are outside the unit circle, and the roots of \(\alpha(B)\) (AR polynomial) are on the unit circle.

Now, consider an analogous time series where “something happens” (an anomalous event) at time \(t = t_1\):

\[ X_t^* = X_t + \omega L(B)\mathbf{1}_{t}(t_1),\] where \(X_t\) is as above, and \(\mathbf{1}(t_1)\) is an indicator function. The form of \(L(B)\) with magnitude \(\omega\) describes the dynamic pattern of the outlier on the series. This construction of an outlier does not capture every possible outlier, but it allows for enough generality to describe many outliers. See Chen and Liu (1993).

Note that this definition can be extended to a series with \(m\) different outliers:

\[ X_t^* = X_t + \sum\limits_{j=1}^m\omega_jL_j(B)\mathbf{1}_{t}(t_j),\]

Outlier classification

One can categorize outliers into four types:

Additive Outlier (AO), where \(L(B) = 1\)

An AO is the most basic outlier. It affects the series with a single instantaneous additive constant at \(t_j\).

Level Shift (LS), where \(L(B) = \frac{1}{1-B}\)

An LS is a true change point in that it will add a constant which persists from \(t_j\) through the rest of the series.

Temporary Change (TC), where \(L(B) = \frac{1}{1-\delta B}\)

A TC is a generalized version of both the AO and LS in that if \(\delta = 0\), then it is an AO, and if \(\delta = 1\), it is an LS. Otherwise, the effect will decay over time.

Innovational Outlier (IO), where \(L(B) = \frac{\theta(B)}{\alpha(B)\phi(B)}\)

An IO is the only outlier considered here which depends directly on the model, so it can cause very different effects. If the time series is stationary, the effect of the IO will decay because the polynomial \(\frac{\theta(B)}{\alpha(B)\phi(B)}\) has coefficients which go to 0. However, for nonstationary series, the effect can persist, so it is better thought of as a change point.

Outlier examples

For illustrative purposes, we’ve added one type of each outlier to the simulated time series.

#Adding outliers to simulated series

outAO <-outliers("AO",100,5)
outLS <-outliers("LS",100,5)
outTC <-outliers("TC",100,15)
outIO <-outliers("IO",100,10)

ts1_AO <- ts1 + outliers.effects(outAO,n =500,weights = T)
ts2_AO <- ts2 + outliers.effects(outAO,n =500,weights = T)
ts3_AO <- ts3 + outliers.effects(outAO,n =500,weights = T)
ts4_AO <- ts4 + outliers.effects(outAO,n =500,weights = T)

par(mfrow=c(2,2))
ts.plot(ts1_AO,main="ts1: AR1",col="blue"); lines(ts1)
ts.plot(ts2_AO,main="ts2: AR2",col="blue"); lines(ts2)
ts.plot(ts3_AO,main="ts3: Random walk",col="blue"); lines(ts3)
ts.plot(ts4_AO,main="ts4: White noise",col="blue"); lines(ts4)

ts1_LS <- ts1 + outliers.effects(outLS,n =500,weights = T)
ts2_LS <- ts2 + outliers.effects(outLS,n =500,weights = T)
ts3_LS <- ts3 + outliers.effects(outLS,n =500,weights = T)
ts4_LS <- ts4 + outliers.effects(outLS,n =500,weights = T)

par(mfrow=c(2,2))
ts.plot(ts1_LS,main="ts1: AR1",col="blue"); lines(ts1)
ts.plot(ts2_LS,main="ts2: AR2",col="blue"); lines(ts2)
ts.plot(ts3_LS,main="ts3: Random walk",col="blue"); lines(ts3)
ts.plot(ts4_LS,main="ts4: White noise",col="blue"); lines(ts4)

ts1_TC <- ts1 + outliers.effects(outTC,n =500,delta=0.7,weights = T)
ts2_TC <- ts2 + outliers.effects(outTC,n =500,delta=0.7,weights = T)
ts3_TC <- ts3 + outliers.effects(outTC,n =500,delta=0.7,weights = T)
ts4_TC <- ts4 + outliers.effects(outTC,n =500,delta=0.7,weights = T)

par(mfrow=c(2,2))
ts.plot(ts1_TC,main="ts1: AR1",col="blue"); lines(ts1)
ts.plot(ts2_TC,main="ts2: AR2",col="blue"); lines(ts2)
ts.plot(ts3_TC,main="ts3: Random walk",col="blue"); lines(ts3)
ts.plot(ts4_TC,main="ts4: White noise",col="blue"); lines(ts4)

ts1_IO <- ts1 + outliers.effects(outIO,n =500,pars = list(arcoefs = c(0.7),macoefs = c(0)),weights = T)
ts2_IO <- ts2 + outliers.effects(outIO,n =500,pars = list(arcoefs = c(2*(1/1.1)*cos(pi/8),-1/1.1^2),macoefs = c(0,0)),weights = T)
ts3_IO <- ts3 + outliers.effects(outIO,n =500,pars = list(arcoefs = c(1),macoefs = c(0)),weights = T)
ts4_IO <- ts4 + outliers.effects(outIO,n =500,pars = list(arcoefs = c(0),macoefs = c(0)),weights = T)

par(mfrow=c(2,2))
ts.plot(ts1_IO,main="ts1: AR1",col="blue"); lines(ts1)
ts.plot(ts2_IO,main="ts2: AR2",col="blue"); lines(ts2)
ts.plot(ts3_IO,main="ts3: Random walk",col="blue"); lines(ts3)
ts.plot(ts4_IO,main="ts4: White noise",col="blue"); lines(ts4)

Outlier Detection

Each of the above outlier types will have a contaminating effect on the residuals. Let \(\pi(B) = \dfrac{\alpha(B)\phi(B)}{\theta(B)} = 1 - \sum\pi_iB^i\). If there is only one outlier at time \(t_j\), then the estimated residuals should be the following:

For an AO: \(\hat{e}_t = \omega \pi(B) \mathbf{1}_{t}(t_j) + Z_t\)
For an LS: \(\hat{e}_t = \omega \frac{\pi(B)}{1-B}\mathbf{1}_{t}(t_j) + Z_t\)
For a TC: \(\hat{e}_t = \omega \frac{\pi(B)}{1-\delta B}\mathbf{1}_{t}(t_j) + Z_t\)
For an IO: \(\hat{e}_t = \omega \mathbf{1}_{t}(t_j) + Z_t\)

If one rewrites these equations as

\[\hat{e}_t = \omega x_{\text{type},t} + Z_t,\quad t=t_j,t_j+1,...,T,\] where \(\text{type} = \text{AO, LS, TC, or IO}\), and \(x_{\text{type}t_{j}} = 1\), and

\[\begin{align} & x_{\text{AO},(t_j+k)}=-\pi_k \\ & x_{\text{LS},(t_j+k)}=1-\sum\limits_{l=1}^{k}\pi_l \\ & x_{\text{TC},(t_j+k)} = \delta^k - \sum\limits_{l=1}^{k-1}\delta^{k-j}\pi_l-\pi_k \\ & x_{\text{IO},(t_j+k)}=0 \end{align}\]

Given this formulation, one can find the LSEs of \(\omega\), that is, the effect of the outlier:

\[\begin{align} & \hat{\omega}_{\text{AO},t_j} = \sum\limits_{t=t_j}^{T}\hat{e}_tx_{\text{AO},t} \bigg/\sum\limits_{t=t_j}^{T}x_{\text{AO},t}^2 \\ & \hat{\omega}_{\text{LS},t_j} = \sum\limits_{t=t_j}^{T}\hat{e}_tx_{\text{LS},t}\bigg/\sum\limits_{t=t_j}^{T}x_{\text{LS},t}^2 \\ & \hat{\omega}_{\text{TC},t_j} = \sum\limits_{t=t_j}^{T}\hat{e}_tx_{\text{TC},t}\bigg/\sum\limits_{t=t_j}^{T}x_{\text{TC},t}^2 \\ & \hat{\omega}_{\text{IO},t_j} = \hat{e}_{t_j} \end{align}\]

These estimates are approximately normal, so we standardize them. They will then act as reasonable test statistics for detecting outliers of each type. The general procedure will be to coompute standardized effects at each time point \(t_j\), \(j=1,...,T\), and check if any exceed some critical value.

\[\begin{align} & \hat{\tau}_{\text{AO},t_j} = \left(\hat{\omega}_{\text{AO},t_j}/\hat{\sigma}_Z\right)\left(\sum\limits_{t=t_j}^{T}x_{\text{AO},t}^2\right)^{1/2} \\ & \hat{\tau}_{\text{AO},t_j} = \left(\hat{\omega}_{\text{LS},t_j}/\hat{\sigma}_Z\right)\left(\sum\limits_{t=t_j}^{T}x_{\text{LS},t}^2\right)^{1/2} \\ & \hat{\tau}_{\text{TC},t_j} = \left(\hat{\omega}_{\text{AO},t_j}/\hat{\sigma}_Z\right)\left(\sum\limits_{t=t_j}^{T}x_{\text{TC},t}^2\right)^{1/2} \\ & \hat{\tau}_{\text{IO},t_j} = \left(\hat{\omega}_{\text{IO},t_j}/\hat{\sigma}_Z\right) \end{align}\\ \]

Note that care must be taken in estimating \(\sigma_Z\) because if there are outliers, the residuals won’t correctly estimate it. Chen and Liu (1993) suggest several alternatives; the default in the tsoutliers package is the median absolute deviation (MAD):

\[\hat{\sigma}_Z = 1.483 \cdot \text{med}|\hat{e}_t-\text{med}(\hat{e}_t)|.\]

Multiple outliers

Generally, we can not assume that there is only one outlier in the series. Since the estimated outlier effects \(\hat{\omega}\) take only one outlier into account, the estimates are no longer unbiased when the residuals are affected by outliers at other time points. To account for this, the suggested procedure is a 3-stage algorithm where both potential outliers and model parameters are estimated and updated with successively more accurate estimates. Stage names are as used by Chen and Liu (1993).

Stage 1: Initial Parameter Estimation and Outlier Detection

First, estimate model parameters (by maximum likelihood), and obtain the residuals. Then compute the maximum across all outlier types and time points of the standardized outlier effects. If the maximum is above the critical value, remove the effect of the outlier and re-compute the maximum standardized outlier effect. Once all outlier effects have been removed, repeat all of Stage 1 beginning with re-estimating the parameters. Once no more significant outlier effects are detected, go to Stage 2.

Stage 2: Joint Estimation of Outlier Effects and Model Parameters

Now that \(m\) time points containing potential outliers have been flagged, and the parameter estimates have been improved, the outlier effects can be estimated jointly instead of one-at-a-time. This is done in an analogous way to the one-outlier case, by multiple regression. If the smallest of the \(m\) standardized outlier effects is no longer significant, remove it from the set of \(m\) flagged outliers and begin Stage 2 over. Otherwise, remove the effects of the remaining flagged outliers. Finally, re-estimate model parameters, and if the residual standard error has improved more than a pre-specified tolerance (say, 0.001), repeat all of Stage 2. Otherwise, go to Stage 3.

Stage 3: Detection of Outliers Based on the Final Parameter Estimates

Now that the parameter estimates have been improved, re-compute the residuals, and pass through the parts of Stages 1 and 2 that involve outlier effect removal one last time (i.e. do not update the parameter estimates). The remaining flagged outliers will be the final collection of outliers.

Approaches to Outlier Detection 2: HOT SAX

One of the well-known machine learning approaches suggested in the literature is the HOT SAX algorithm (Keogh, Lin, and Fu 2005). Contrary to the method of (Chen and Liu 1993), the HOT SAX algorithm does not assume any parametric distribution for time series data. Instead, it uses the SAX representation which results from the probability integral transform of standard normal random variables into discrete uniform random variables. After applying that transformation, we compute the distances between different subsequences and find the most outlying discord which has the farthest distance from its nearest neighbor.

An example of the SAX representation of a time series in (Keogh, Lin, and Fu 2005). The rotated density curve shows the density of the standard normal distribution. In this example, the number of alphabets is set to \(k=3\), so each subsequence of size \(m=4\) is mapped to one of the first three alphabets \(a,b,c\).

The HOT SAX algorithm starts by representing a time series by a SAX expression. For a time series \(X=\left(X_1,X_2,\cdots,X_T\right)\in\mathbb{R}^T\), a subsequence of the time series \(X\) of length \(p\) starting at the position \(j\) is \(S_j^p=\left(X_j,X_{j+1},\ldots,X_{j+p-1}\right)\). The Piecewise Aggregate Approximation (PAA) of the subsequence \(S_j^p\) is defined as \[\bar{C}=\left(\bar{C}_1,\bar{C}_2,\ldots,\bar{C}_w\right)\] where \[\bar{C}_{j'}=\frac{1}{\min\{mj',n\}}\left(X_{m(j'-1)+1}+X_{m(j'-1)+2}+\ldots+ X_{\min\{mj',p\}}\right)\] for \(j'=1,2,\ldots,w\), \(w\) is the PAA size and \(m=p/w\) is the number of observations averaged in PAA. The dimension is reduced from the window size \(p\) to the PAA size \(w\).

Next, we obtain \(k+1\) equally spaced quantiles of the standard normal distribution as \(B=\left(q_0,q_1,\ldots,q_k\right)\) where \(q_0=-\infty\), \(q_k=\infty\) and \[P\left(q_j\leq Z\leq q_{j+1}\right)=\frac{1}{k}\] for all \(j=0,1,\ldots,k-1\) where \(Z\sim N(0,1)\). A PAA \(\bar{C}=\left(\bar{c}_1,\bar{c}_2,\ldots,\bar{c}_w\right)\) of length \(w\) can be represented as a SAX word of size \(w\) as follows. Let \(\alpha_j\) denote the \(j\)-th alphabet, i.e. \[\alpha_1=a,\,\alpha_2=b,\,\alpha_3=c,\ldots\] Then the mapping from the PAA to a SAX word \(\hat{C}=\left(\hat{C}_1,\hat{C}_2,\ldots,\hat{C}_w\right)\) is obtained as follows: \[\hat{C}_i=\alpha_j\quad\text{if}\quad q_{j-1}\leq\bar{C}_i\leq q_j\] for \(i=1,2,\cdots,w\) and \(j=1,2,\cdots,k\).

Given two subsequences of positive integers \(S_{i_1}^p=\left(X_{i_1},X_{i_1+1},\ldots,X_{i_1+p-1}\right)\) and \(S_{i_2}^p=\left(X_{i_2},X_{i_2+1},\ldots,X_{i_2+p-1}\right)\), the Euclidean distance between them is \[d\left(S_{i_1}^p,S_{i_2}^p\right)= \left(\sum_{j=1}^p\left(X_{i_1+j}-X_{i_2+j}\right)^2\right)^{1/2}\] where \(X_{i_1+j}-X_{i_2+j}\) is the difference between orders of alphabets, e.g. \(b-a=1,c-a=2\). In addition, we say the subsequences \(S_{i_1}^p\) and \(S_{i_2}^p\) are non-self match if they do not have overlapping indices, i.e. \(\left|i_1-i_2\right|>p\). The subsequence \(S\) of length \(p\) is said to be the discord of \(X\) if \(S\) has the largest distance to its nearest non-self match. That is, \[\begin{align*} &\quad \min_{1\leq i\leq T-p}\left\{d\left(S,S_i^p\right);S_i^p\text{ is a non-self match of }S\right\}\\ &\geq \min_{1\leq i\leq T-p}\left\{d\left(S',S_i^p\right);S_i^p\text{ is a non-self match of }S'\right\} \end{align*}\]

for all \(S'\) that is a non-self matach of \(S\).

Distance-based method

One of the problems of the HOT SAX algorithm is that it reports discords in the order of their significances, but does not report their p-values. To obtain their p-values, we should figure out the distribution of distances between different discretized subsequences. Since the discretization technique of the HOT SAX method is for computational convenience, we do not consider discretization here. If a subsequence is far from its nearest neighbor, then the distance between them should be significantly large relative to the distribution of the minimum distance between \(T\) pairs of subsequences.

To express the idea, we assume that we have \[\left(X_{11},X_{12},\ldots,X_{1p}\right),\left(X_{21},X_{22},\ldots,X_{2p}\right),\ldots,\left(X_{(T-p)1},X_{(T-p)2},\ldots,X_{(T-p)p}\right)\] where \(\left(X_{j1},X_{j2},\ldots,X_{jp}\right)\) is the \(j\)-th subsequence of size \(p\) starting at time index \(j\). If the time series is Gaussian with mean 0 and variance 1, we can obtain the distribution of the distance between the \(i\)-th subsequence and \(j\)-th subsequence by \[\begin{align*} K_{ij}&=\frac{1}{2}\|X_{i1}-X_{j1},X_{i2}-X_{j2},\cdots,X_{ip}-X_{jp}\|_2^2\\ &=\sum_{k=1}^p\frac{\left(X_{ik}-Y_{jk}\right)^2}{2}\\ &\sim\chi_p^2 \end{align*}\]

where \(\chi_p^2\) is the chi-square distribution with \(p\) degrees of freedom.

To see whether the \(i\)-th subsequence is an outlying discord, we compare its distance with its nearest neighbor to the distribution of minimum of \(T-p\) chi-square random variables with \(p\) degrees of freedom. Note from Subsection 2.3.4 of (Serfling 1980) that the distribution of the \(p\)-th sample quantile \(\hat{\xi}_{p,T}\) when the random sample of size \(T\) is given as \[\begin{align*} P(\hat{\xi}_{p,T}\leq t)=\sum_{i=m}^T \left(\begin{array}{c}n\\i\end{array}\right) \left\{F(t)\right\}^i\left\{1-F(t)\right\}^{n-i} \end{align*}\]

where \[m=\left\{ \begin{array}{ll} Tp&\text{if }Tp\text{ is an integer},\\ [Tp]+1&\text{if }Tp\text{ is not an integer}. \end{array} \right.\] Note that in our case, \(F=\chi_p^2\) which is the distribution of the squared Euclidean distance between \(p\)-dimensional Gaussian random vectors. A problem with considering the minimum value of all the distances is that the two subsequences \(S_j=\left(t_j,t_{j+1},\cdots,t_{j+p-1}\right)\) and \(S_{j+k}=\left(t_{j+k},t_{j+k+1},\cdots,t_{j+k+p-1}\right)\) for \(k=1,2,\cdots,p\) are not independent of each other. Hence, we consider \((p/T)\)-th sample quantile of the distribution \(\chi_p^2\) so that the distances among overlapping subsequences are ignored. Note that the HOT SAX method removed overlapping subsequences by force, but we naturally get rid of them by looking at an appropriate sample quantile. Since this method does not use discretized representation of time series but uses distances between original subsequences, we call this method the distance-based method.

ECG data analysis

For ECG data sets, we compare the locations of outlying discords obtained by the method of (Chen and Liu 1993) which is called tsoutliers method here, HOT SAX and distance-based method. Each figure shows the locations of outlying discords found by the HOT SAX and distance-based methods by a red solid line and an orange dashed line, and the location of the two strongest outliers (i.e. the two largest test statistics \(\hat{\tau}\)) found by tsoutlier method by a thick and thin black dashed vertical line, respectively. Note that Chen and Liu’s method only detects the location of outlier but not outlying subsequence, so only their locations are reported. Here, the subsequence size is fixed as \(p=6\), PAA size is fixed as \(w=3\) and alphabet size is fixed as 5.

ECG data analysis results

For the first series, each of the three methods indicated different segments as the most outlying part. However, the second strongest outlier as detected by tsoutliers does coincide with the HOT SAX result. Since the HOT SAX method does not provide the p-values, only the p-value of distance-based method is reported at the lower left corner of the figure. The test statistics are -7.09 and -5.53 for the two strongest outliers detected by tsoutliers.

Outliers in ECG1 data detected by various methods

For the ECG2 data, the HOT SAX and tsoutlier method indicate the same outlying discord (the two strongest outliers indicated by tsoutliers are adjacent), but the distance-based method indicates a different outlying discord. The test statistics are -7.04 and -6.85 for the two strongest outliers detected by tsoutliers.

Outliers in ECG2 data detected by various methods

With ECG3 data, the tsoutliers method’s second strongest outlier interestingly indicates the same outlying position with the distance-based method, while the HOT SAX method finds a different position as an outlying discord. This does not coincide with our intuition, so further research is suggested.The test statistics are 9.07 and 8.65 for the two strongest outliers detected by tsoutliers.

Outliers in ECG3 data detected by various methods

Check the ARIMA fit

fit1_ecg <- auto.arima(ecg1)
res1_ecg <- residuals(fit1_ecg)
par1_ecg <- coefs2poly(fit1_ecg)
ts.plot(ecg1,main="ECG1 and ARIMA(4,0,3)")
lines(fit1_ecg$residuals+fit1_ecg$x,col="blue",lwd=2,lty=3)

fit2_ecg <- auto.arima(ecg2)
res2_ecg <- residuals(fit2_ecg)
par2_ecg <- coefs2poly(fit2_ecg)
ts.plot(ecg2,main="ECG2 and ARIMA(1,0,2)")
lines(fit2_ecg$residuals+fit2_ecg$x,col="blue",lwd=2,lty=3)

fit3_ecg <- auto.arima(ecg3)
res3_ecg <- residuals(fit3_ecg)
par3_ecg <- coefs2poly(fit3_ecg)
ts.plot(ecg3,main="ECG3 and ARIMA(4,0,2)")
lines(fit3_ecg$residuals+fit3_ecg$x,col="blue",lwd=2,lty=3)

Simulated data analysis

#### Trying to detect multiple outliers
outAOAO <-outliers(c("AO","AO"),c(100,200),c(5,5))
outAOTC <-outliers(c("AO","TC"),c(100,200),c(3,4))
outIOAO <-outliers(c("IO","AO"),c(100,200),c(3,4))
outIOTC <-outliers(c("IO","TC"),c(100,200),c(3,4))
outIOTCAOAO <-outliers(c("IO","TC","AO","AO"),c(100,200,220,300),c(3,4,5,3))


#### Two AOs at t = 100 and 200
ts1_AOAO <- ts1 + rowSums(outliers.effects(outAOAO,n =500,weight=T))
ts2_AOAO <- ts2 + rowSums(outliers.effects(outAOAO,n =500,weight=T))
ts3_AOAO <- ts3 + rowSums(outliers.effects(outAOAO,n =500,weight=T))
ts4_AOAO <- ts4 + rowSums(outliers.effects(outAOAO,n =500,weight=T))

par(mfrow=c(2,2))
ts.plot(ts1_AOAO,main="ts1: AR1",col="blue"); lines(ts1)
ts.plot(ts2_AOAO,main="ts2: AR2",col="blue"); lines(ts2)
ts.plot(ts3_AOAO,main="ts3: Random walk",col="blue"); lines(ts3)
ts.plot(ts4_AOAO,main="ts4: White noise",col="blue"); lines(ts4)

fit1_AOAO <- auto.arima(ts1_AOAO)
res1_AOAO <- residuals(fit1_AOAO)
par1_AOAO <- coefs2poly(fit1_AOAO)

fit2_AOAO <- auto.arima(ts2_AOAO)
res2_AOAO <- residuals(fit2_AOAO)
par2_AOAO <- coefs2poly(fit2_AOAO)

fit3_AOAO <- auto.arima(ts3_AOAO)
res3_AOAO <- residuals(fit3_AOAO)
par3_AOAO <- coefs2poly(fit3_AOAO)

fit4_AOAO <- auto.arima(ts4_AOAO)
res4_AOAO <- residuals(fit4_AOAO)
par4_AOAO <- coefs2poly(fit4_AOAO)

locate.outliers(res1_AOAO,par1_AOAO,cval=3.5,types=c("IO","AO","LS","TC"))

##   type ind   coefhat     tstat
## 3   IO 201 -5.511296 -5.870335
## 4   IO 487  3.387512  3.608195
## 6   AO 200  4.686799  6.005557
## 8   TC 100  5.697430  6.161842

locate.outliers(res2_AOAO,par2_AOAO,cval=3.5,types=c("IO","AO","LS","TC"))

##    type ind   coefhat     tstat
## 2    IO 101 -7.898855 -7.699915
## 4    IO 201 -6.808289 -6.636815
## 5    AO  99 -3.983213 -7.141504
## 6    AO 100  5.781837 10.366259
## 8    AO 199 -3.367286 -6.037208
## 9    AO 200  4.889383  8.766177
## 12   LS 240 -1.274401 -3.821861
## 13   LS 253 -1.266609 -3.713950
## 14   LS 255 -1.307983 -3.821660
## 15   LS 256 -1.204410 -3.512759
## 16   LS 262 -1.286873 -3.712740
## 17   LS 264 -1.259190 -3.619558
## 18   LS 271 -1.304814 -3.702007
## 19   LS 274 -1.250210 -3.526899
## 20   LS 282 -1.319931 -3.666148
## 21   LS 302 -1.355045 -3.612059
## 22   LS 305 -1.332620 -3.529372
## 23   LS 307 -1.472879 -3.883868
## 24   LS 311 -1.395680 -3.647922
## 25   LS 319 -1.456897 -3.739411

locate.outliers(res3_AOAO,par3_AOAO,cval=3.5,types=c("IO","AO","LS","TC"))

##   type ind   coefhat     tstat
## 2   IO 101 -4.349019 -4.082808
## 5   AO  99 -3.301957 -4.383833
## 6   AO 100  4.722901  6.270345
## 7   AO 200  4.624577  6.139806
## 8   AO 201 -3.130910 -4.156744

locate.outliers(res4_AOAO,par4_AOAO,cval=3.5,types=c("IO","AO","LS","TC"))

##   type ind  coefhat    tstat
## 1   IO 100 6.503760 6.208938
## 4   AO 200 5.108629 4.936753

#### AO and TC at t = 100 and 200
ts1_AOTC <- ts1 + rowSums(outliers.effects(outAOTC,n =500,pars = list(arcoefs = c(0.7),macoefs = c(0)),weights = T))
ts2_AOTC <- ts2 + rowSums(outliers.effects(outAOTC,n =500,pars = list(arcoefs = c(2*(1/1.1)*cos(pi/8),-1/1.1^2),macoefs = c(0,0)),weights = T))
ts3_AOTC <- ts3 + rowSums(outliers.effects(outAOTC,n =500,pars = list(arcoefs = c(1),macoefs = c(0)),weights = T))
ts4_AOTC <- ts4 + rowSums(outliers.effects(outAOTC,n =500,pars = list(arcoefs = c(0),macoefs = c(0)),weights = T))

par(mfrow=c(2,2))
ts.plot(ts1_AOTC,main="ts1: AR1",col="blue"); lines(ts1)
ts.plot(ts2_AOTC,main="ts2: AR2",col="blue"); lines(ts2)
ts.plot(ts3_AOTC,main="ts3: Random walk",col="blue"); lines(ts3)
ts.plot(ts4_AOTC,main="ts4: White noise",col="blue"); lines(ts4)

fit1_AOTC <- auto.arima(ts1_AOTC)
res1_AOTC <- residuals(fit1_AOTC)
par1_AOTC <- coefs2poly(fit1_AOTC)

fit2_AOTC <- auto.arima(ts2_AOTC)
res2_AOTC <- residuals(fit2_AOTC)
par2_AOTC <- coefs2poly(fit2_AOTC)

fit3_AOTC <- auto.arima(ts3_AOTC)
res3_AOTC <- residuals(fit3_AOTC)
par3_AOTC <- coefs2poly(fit3_AOTC)

fit4_AOTC <- auto.arima(ts4_AOTC)
res4_AOTC <- residuals(fit4_AOTC)
par4_AOTC <- coefs2poly(fit4_AOTC)

locate.outliers(res1_AOTC,par1_AOTC,cval=3.5,types=c("IO","AO","LS","TC"))

##   type ind  coefhat    tstat
## 2   IO 487 3.427157 3.632869
## 3   TC 100 3.935892 4.264941

locate.outliers(res2_AOTC,par2_AOTC,cval=3.5,types=c("IO","AO","LS","TC"))

##    type ind   coefhat     tstat
## 2    IO 101 -5.674572 -5.634094
## 4    AO  99 -2.774362 -5.497246
## 6    AO 199 -2.462479 -4.879268
## 8    LS 240 -1.303710 -3.539596
## 9    LS 307 -1.458729 -3.514061
## 10   TC 100  5.654358  7.564315
## 12   TC 200  4.036516  5.399990

locate.outliers(res3_AOTC,par3_AOTC,cval=3.5,types=c("IO","AO","LS","TC"))

##   type ind  coefhat    tstat
## 1   IO 200 4.307538 4.070686
## 2   AO 100 2.722901 3.639027

locate.outliers(res4_AOTC,par4_AOTC,cval=3.5,types=c("IO","AO","LS","TC"))

##   type ind  coefhat    tstat
## 1   IO 100 4.535893 4.559566
## 6   AO 203 3.703661 3.739537
## 7   TC 200 4.071421 5.741692
## 8   TC 201 2.651550 3.739329
## 9   TC 202 3.107387 4.382169

#### IO and AO at t = 100 and 200

ts1_IOAO <- ts1 + rowSums(outliers.effects(outIOAO,n =500,pars = list(arcoefs = c(0.7),macoefs = c(0)),weights = T))
ts2_IOAO <- ts2 + rowSums(outliers.effects(outIOAO,n =500,pars = list(arcoefs = c(2*(1/1.1)*cos(pi/8),-1/1.1^2),macoefs = c(0,0)),weights = T))
ts3_IOAO <- ts3 + rowSums(outliers.effects(outIOAO,n =500,pars = list(arcoefs = c(1),macoefs = c(0)),weights = T))
ts4_IOAO <- ts4 + rowSums(outliers.effects(outIOAO,n =500,pars = list(arcoefs = c(0),macoefs = c(0)),weights = T))

par(mfrow=c(2,2))
ts.plot(ts1_IOAO,main="ts1: AR1",col="blue"); lines(ts1)
ts.plot(ts2_IOAO,main="ts2: AR2",col="blue"); lines(ts2)
ts.plot(ts3_IOAO,main="ts3: Random walk",col="blue"); lines(ts3)
ts.plot(ts4_IOAO,main="ts4: White noise",col="blue"); lines(ts4)

fit1_IOAO <- auto.arima(ts1_IOAO)
res1_IOAO <- residuals(fit1_IOAO)
par1_IOAO <- coefs2poly(fit1_IOAO)

fit2_IOAO <- auto.arima(ts2_IOAO)
res2_IOAO <- residuals(fit2_IOAO)
par2_IOAO <- coefs2poly(fit2_IOAO)

fit3_IOAO <- auto.arima(ts3_IOAO)
res3_IOAO <- residuals(fit3_IOAO)
par3_IOAO <- coefs2poly(fit3_IOAO)

fit4_IOAO <- auto.arima(ts4_IOAO)
res4_IOAO <- residuals(fit4_IOAO)
par4_IOAO <- coefs2poly(fit4_IOAO)

locate.outliers(res1_IOAO,par1_IOAO,cval=3.5,types=c("IO","AO","LS","TC"))

##   type ind   coefhat     tstat
## 1   IO 201 -4.995493 -5.065672
## 2   AO 100  3.355737  4.115062
## 3   AO 200  3.816299  4.679838

locate.outliers(res2_IOAO,par2_IOAO,cval=3.5,types=c("IO","AO","LS","TC"))

##    type ind    coefhat     tstat
## 2    IO 101 -10.125567 -9.743027
## 5    AO  99  -3.443909 -6.176315
## 6    AO 100   6.156371 11.040850
## 8    AO 199  -2.774893 -4.976499
## 9    AO 200   3.892704  6.981184
## 13   LS 240  -1.277049 -3.716171
## 14   LS 253  -1.270986 -3.616882
## 15   LS 255  -1.304815 -3.700085
## 16   LS 262  -1.292430 -3.619310
## 17   LS 264  -1.258366 -3.511116
## 18   LS 271  -1.303902 -3.591352
## 19   LS 282  -1.322201 -3.565869
## 20   LS 307  -1.470702 -3.767508
## 21   LS 311  -1.394995 -3.542458
## 22   LS 319  -1.453288 -3.624794
## 26   TC 201  -4.247531 -5.226165

locate.outliers(res3_IOAO,par3_IOAO,cval=3.5,types=c("IO","AO","LS","TC"))

##   type ind   coefhat     tstat
## 1   IO 101 -5.343007 -5.015952
## 3   AO 100  4.222901  5.606522
## 4   AO 200  3.624577  4.812159

locate.outliers(res4_IOAO,par4_IOAO,cval=3.5,types=c("IO","AO","LS","TC"))

##   type ind  coefhat    tstat
## 1   IO 100 4.511957 4.220767
## 4   AO 200 4.168284 3.944863

#### IO and TC at t = 100 and 200
ts1_IOTC <- ts1 + rowSums(outliers.effects(outIOTC,n =500,pars = list(arcoefs = c(0.7),macoefs = c(0)),delta=0.7,weights = T))
ts2_IOTC <- ts2 + rowSums(outliers.effects(outIOTC,n =500,pars = list(arcoefs = c(2*(1/1.1)*cos(pi/8),-1/1.1^2),macoefs = c(0,0)),delta=0.7,weights = T))
ts3_IOTC <- ts3 + rowSums(outliers.effects(outIOTC,n =500,pars = list(arcoefs = c(1),macoefs = c(0)),delta=0.7,weights = T))
ts4_IOTC <- ts4 + rowSums(outliers.effects(outIOTC,n =500,pars = list(arcoefs = c(0),macoefs = c(0)),delta=0.7,weights = T))

par(mfrow=c(2,2))
ts.plot(ts1_IOTC,main="ts1: AR1",col="blue"); lines(ts1)
ts.plot(ts2_IOTC,main="ts2: AR2",col="blue"); lines(ts2)
ts.plot(ts3_IOTC,main="ts3: Random walk",col="blue"); lines(ts3)
ts.plot(ts4_IOTC,main="ts4: White noise",col="blue"); lines(ts4)

fit1_IOTC <- auto.arima(ts1_IOTC)
res1_IOTC <- residuals(fit1_IOTC)
par1_IOTC <- coefs2poly(fit1_IOTC)

fit2_IOTC <- auto.arima(ts2_IOTC)
res2_IOTC <- residuals(fit2_IOTC)
par2_IOTC <- coefs2poly(fit2_IOTC)

fit3_IOTC <- auto.arima(ts3_IOTC)
res3_IOTC <- residuals(fit3_IOTC)
par3_IOTC <- coefs2poly(fit3_IOTC)

fit4_IOTC <- auto.arima(ts4_IOTC)
res4_IOTC <- residuals(fit4_IOTC)
par4_IOTC <- coefs2poly(fit4_IOTC)

locate.outliers(res1_IOTC,par1_IOTC,cval=3.5,types=c("IO","AO","LS","TC"))

##   type ind  coefhat    tstat
## 1   AO 100 3.355035 4.141901

locate.outliers(res2_IOTC,par2_IOTC,cval=3.5,types=c("IO","AO","LS","TC"))

##    type ind    coefhat      tstat
## 2    IO 101 -10.299976 -10.033647
## 4    AO  99  -3.489905  -6.463779
## 5    AO 100   6.137777  11.367998
## 7    AO 199  -2.439758  -4.518763
## 10   LS 240  -1.284811  -3.657114
## 11   LS 253  -1.279111  -3.561813
## 12   LS 255  -1.303588  -3.617416
## 13   LS 262  -1.302190  -3.569285
## 14   LS 271  -1.304968  -3.519060
## 15   LS 282  -1.326270  -3.503321
## 16   LS 307  -1.467439  -3.685575
## 17   LS 319  -1.452251  -3.553314
## 20   TC 200   4.029129   5.093511

locate.outliers(res3_IOTC,par3_IOTC,cval=3.5,types=c("IO","AO","LS","TC"))

##   type ind   coefhat     tstat
## 1   IO 101 -5.375346 -5.056359
## 2   IO 200  4.281211  4.027153
## 3   AO 100  4.222901  5.617685

locate.outliers(res4_IOTC,par4_IOTC,cval=3.5,types=c("IO","AO","LS","TC"))

##   type ind  coefhat    tstat
## 1   IO 100 4.535893 4.559566
## 6   AO 203 3.703661 3.739537
## 7   TC 200 4.071421 5.741692
## 8   TC 201 2.651550 3.739329
## 9   TC 202 3.107387 4.382169

#### IO and TC and 2 AO's at 100, 200, 220, 300
ts1_IOTCAOAO <- ts1 + rowSums(outliers.effects(outIOTCAOAO,n =500,pars = list(arcoefs = c(0.7),macoefs = c(0)),delta=0.7,weights = T))
ts2_IOTCAOAO <- ts2 + rowSums(outliers.effects(outIOTCAOAO,n =500,pars = list(arcoefs = c(2*(1/1.1)*cos(pi/8),-1/1.1^2),macoefs = c(0,0)),delta=0.7,weights = T))
ts3_IOTCAOAO <- ts3 + rowSums(outliers.effects(outIOTCAOAO,n =500,pars = list(arcoefs = c(1),macoefs = c(0)),delta=0.7,weights = T))
ts4_IOTCAOAO <- ts4 + rowSums(outliers.effects(outIOTCAOAO,n =500,pars = list(arcoefs = c(0),macoefs = c(0)),delta=0.7,weights = T))

par(mfrow=c(2,2))
ts.plot(ts1_IOTCAOAO,main="ts1: AR1",col="blue"); lines(ts1)
ts.plot(ts2_IOTCAOAO,main="ts2: AR2",col="blue"); lines(ts2)
ts.plot(ts3_IOTCAOAO,main="ts3: Random walk",col="blue"); lines(ts3)
ts.plot(ts4_IOTCAOAO,main="ts4: White noise",col="blue"); lines(ts4)

fit1_IOTCAOAO <- auto.arima(ts1_IOTCAOAO)
res1_IOTCAOAO <- residuals(fit1_IOTCAOAO)
par1_IOTCAOAO <- coefs2poly(fit1_IOTCAOAO)

fit2_IOTCAOAO <- auto.arima(ts2_IOTCAOAO)
res2_IOTCAOAO <- residuals(fit2_IOTCAOAO)
par2_IOTCAOAO <- coefs2poly(fit2_IOTCAOAO)

fit3_IOTCAOAO <- auto.arima(ts3_IOTCAOAO)
res3_IOTCAOAO <- residuals(fit3_IOTCAOAO)
par3_IOTCAOAO <- coefs2poly(fit3_IOTCAOAO)

fit4_IOTCAOAO <- auto.arima(ts4_IOTCAOAO)
res4_IOTCAOAO <- residuals(fit4_IOTCAOAO)
par4_IOTCAOAO <- coefs2poly(fit4_IOTCAOAO)

locate.outliers(res1_IOTCAOAO,par1_IOTCAOAO,cval=3.5,types=c("IO","AO","LS","TC"))

##   type ind  coefhat    tstat
## 2   AO 100 3.353870 4.138985
## 3   AO 220 4.263639 5.261721

locate.outliers(res2_IOTCAOAO,par2_IOTCAOAO,cval=3.5,types=c("IO","AO","LS","TC"))

##    type ind   coefhat     tstat
## 2    IO 101 -9.820925 -9.364260
## 7    AO  99 -3.367920 -5.783043
## 8    AO 100  6.184503 10.619387
## 10   AO 199 -2.406531 -4.132246
## 12   AO 219 -3.195441 -5.486879
## 13   AO 220  4.932427  8.469452
## 16   AO 300  2.961900  5.085867
## 19   LS 240 -1.256472 -3.798595
## 20   LS 253 -1.249809 -3.693280
## 21   LS 255 -1.296503 -3.817491
## 22   LS 256 -1.201418 -3.531118
## 23   LS 262 -1.269885 -3.691505
## 24   LS 264 -1.248983 -3.617255
## 25   LS 271 -1.288909 -3.683747
## 26   LS 274 -1.239983 -3.523461
## 27   LS 282 -1.305346 -3.651152
## 28   LS 299 -1.376111 -3.715723
## 29   LS 301 -1.746905 -4.696609
## 30   LS 305 -1.336865 -3.562901
## 31   LS 307 -1.475463 -3.914892
## 32   LS 311 -1.399847 -3.681041
## 33   LS 314 -1.347445 -3.519074
## 34   LS 319 -1.454594 -3.755020
## 37   TC 200  4.027857  4.786556
## 39   TC 221 -5.460075 -6.488551

locate.outliers(res3_IOTCAOAO,par3_IOTCAOAO,cval=3.5,types=c("IO","AO","LS","TC"))

##    type ind   coefhat     tstat
## 1    IO 101 -5.343007 -5.028660
## 4    IO 221 -5.797052 -5.455993
## 5    AO 100  4.222901  5.620726
## 6    AO 219 -3.192060 -4.248666
## 7    AO 220  5.275832  7.022188
## 8    AO 300  2.899103  3.858737
## 10   LS 200  4.313550  4.059770

locate.outliers(res4_IOTCAOAO,par4_IOTCAOAO,cval=3.5,types=c("IO","AO","LS","TC"))

##    type ind  coefhat    tstat
## 1    IO 100 4.503322 4.515273
## 4    IO 220 4.169368 4.180433
## 7    AO 203 3.633173 3.681499
## 9    TC 200 4.093965 5.822612
## 10   TC 201 2.641917 3.757447
## 11   TC 202 3.170704 4.509510

References

Aggarwal, Charu C. 2017. Outlier Analysis. Springer, Cham. doi:10.1007/978-3-319-47578-3.

Chen, C., and L. Liu. 1993. “Joint Estimation of Model Parameters and Outlier Effects in Time Series.” Journal of the American Statistical Association 88 (421). Taylor & Francis Group: 284–97.

Hawkins, D. M. 1980. Identification of Outliers. Vol. 11. Springer.

Hodge, V. J., and J. Austin. 2004. “A Survey of Outlier Detection Methodologies.” Artificial Intelligence Review 22 (2). Springer: 85–126.

Keogh, E., J. Lin, and A. Fu. 2005. “Hot Sax: Efficiently Finding the Most Unusual Time Series Subsequence.” In Data Mining, Fifth Ieee International Conference on, 8–pp. Ieee.

Serfling, Robert J. 1980. Approximation Theorems of Mathematical Statistics. John Wiley & Sons, Inc., New York.