Recall from the previous lecture the definition of WFT, \[ S_f(u,\xi) = \int_{\mathbb R} f(t) g_{(u,\xi)}(t) dt = \int_{\mathbb R} f(t) e^{-i\xi t} g(t-u) dt \] with atoms \(g_{(u,\xi)}(t) = e^{-i\xi t} g(t-u)\), and the definition of CWT, \[ W_f(u,s) = \int_{\mathbb R} f(t) g_{(u,s)}(t) dt = \int_{\mathbb R} f(t) \frac{1}{\sqrt{s}} \psi\Big( \frac{t-u}{s} \Big) dt \] with atoms \(g_{(u,s)}(t) = \frac{1}{\sqrt{s}} \psi\Big( \frac{t-u}{s} \Big)\). The information in WFT and CWT has lots of redundancy as discussed previously. Could it be that a countable collection \[ \{S_f(u_n,\xi_n)\}\quad \mbox{or}\quad \{W_f(u_n,s_n)\} \] provides as much information as the respective “continuous” collections? Could it be that the collection of atoms \(\{g_{(u_n,\xi_n)}\}\) or \(\{g_{(u_n,s_n)}\}\) forms a suitable, preferably orthogonal, basis?
We will see that this involves difficulties for WFT but also that things look rosier for CWT. In particular, we will discuss the so-called orthogonal wavelet basis and discrete wavelet transform (DWT). DWT will then be illustrated on several applications.
No less importantly, the introduction of wavelets has also provided a multiscale perspective in modeling (when a phenomenon is modeled across different scales).
Several \(\verb!R!\) packages focus on DWT and its statistical applications, including \(\verb!wavethresh!\) (associated with the textbook of Nason (2008)), \(\verb!wavelets!\), \(\verb!waveslim!\), and the previously used \(\verb!Rwave!\). Much of this lecture will use \(\verb!wavethresh!\).
To address the question raised above, the following notion is useful. Let \(\|f\|^2 = \int |f(t)|^2 dt\) and \(\langle f,g \rangle = \int f(t) \overline{g(t)} dt\).
Definition 3.1: A collection of functions \(\{\phi_n\}\) is called a frame if, for some constants \(0<A,B<\infty\) and any \(f\), \[ A \|f\|^2 \leq \sum |\langle f,\phi_n\rangle|^2 \leq B \|f\|^2. \] The constants \(A,B\) are called frame bounds. The case \(A=B\) is referred to as a tight frame. The case \(A>1\) (supposing \(\|\phi_n\|=1\)) is referred to as a redundant frame.
Some known useful facts
Example 3.1 (Windowed Fourier frames): Consider the atoms \(g_{(u,\xi)}(t) = e^{-i\xi t} g(t-u)\) of WFT as above. A natural discretization is to consider \(u = m u_0\), \(\xi = n \xi_0\) and hence the functions \[ h_{(m,n)}(t) = e^{-in\xi_0 t} g(t-mu_0),\quad m,n\in\mathbb{Z}. \] Their Heisenberg boxes are often represented as:
When is this a frame? Necessary and sufficient conditions are available, the form of a dual frame is known, etc.
Balian-Low Theorem (1980’s): If \(\{h_{m,n}\}\) is orthogonal, then \[ \int t^2 |g(t)|^2 dt = \infty\quad \mbox{or}\quad \int w^2 |\widehat g(w)|^2 dw = \infty. \] That is, there are no orthogonal windowed Fourier basis with good time-frequency localization.
Note: The case of the Gaussian window is discussed in detail in e.g. Daubechies (1992), pp. 84-87, and Mallat (1998), pp. 142-143.
Example 3.2 (Wavelet frames): Consider the atoms \(g_{(u,s)}(t) = \frac{1}{\sqrt{s}} \psi\Big( \frac{t-u}{s} \Big)\) of CWT as above. A natural discretization is to consider \(u = nu_0a^j\), \(s = a^j\) and hence the functions \[ h_{(n,j)}(t) = \frac{1}{\sqrt{a^j}} \psi\Big( \frac{t-nu_0a^j}{a^j} \Big)= \frac{1}{\sqrt{a^j}} \psi\Big( \frac{t}{a^j} -nu_0 \Big),\quad n,j\in\mathbb{Z}. \] Moreover, \(a=2\) (dyadic scales \(a^j = 2^j\)) and \(u_0=1\) are often taken. Their Heisenberg boxes are often represented as:
Note: Here, with \(a=2\), large positive \(j\) corresponds to large scales and low frequencies, and large negative \(j\) corresponds to small scales and small frequencies. This is the preferred convention in the applied literature. In mathematics, \(j\) is often replaced by \((-j)\).
Similarly to windowed Fourier frames above, necessary and sufficient conditions are available, the form of a dual frame is known, etc.
Analogue of Balian-Low Theorem: It does not hold. In fact, Y. Meyer (of Meyer (1992) and the Abel prize above) was trying to prove the analogue for WT frames but in the process came up with an orthogonal wavelet basis with good time-frequency localization. I. Daubechies (of Daubechies (1992)) later constructed such orthogonal wavelet bases with compact time supports.
Take \[ \psi(t) = \left\{ \begin{array}{cl} 1, & 0\leq t < \frac{1}{2},\\ -1, & \frac{1}{2}\leq t \leq 1,\\ 0, & \mbox{otherwise}. \end{array} \right. \] Then, \(\{2^{-j/2} \psi(2^{-j}t-n)\}\) is an orthonormal basis, so that \[ f(t) = \sum_{n\in\mathbb{Z}} \sum_{j\in\mathbb{Z}} d_{j,n} 2^{-j/2} \psi(2^{-j}t-n), \] where \[ d_{j,n} = \int f(t) 2^{-j/2} \psi(2^{-j}t-n) dt. \]
In fact, with \[ \phi(t) = \left\{ \begin{array}{cl} 1, & 0\leq t < 1,\\ 0, & \mbox{otherwise}, \end{array} \right. \] one can also write: for any \(J\), \[ f(t) = \sum_{n\in\mathbb{Z}} a_{J,n} 2^{-J/2} \phi(2^{-J}t-n) + \sum_{n\in\mathbb{Z}} \sum_{j=-\infty}^{J} d_{j,n} 2^{-j/2} \psi(2^{-j}t-n), \] where \[ a_{J,n} = \int f(t) 2^{-J/2} \phi(2^{-J}t-n) dt. \] Note also that \[ \frac{1}{\sqrt{2}} \psi(\frac{t}{2}) = \frac{1}{\sqrt{2}} \phi(t) - \frac{1}{\sqrt{2}} \phi(t-1),\quad \frac{1}{\sqrt{2}} \phi(\frac{t}{2}) = \frac{1}{\sqrt{2}} \phi(t) + \frac{1}{\sqrt{2}} \phi(t-1) \] and hence e.g. \[ d_{j+1,p} = \frac{1}{\sqrt{2}} a_{j,2p} - \frac{1}{\sqrt{2}} a_{j,2p+1},\quad a_{j+1,p} = \frac{1}{\sqrt{2}} a_{j,2p} + \frac{1}{\sqrt{2}} a_{j,2p+1}. \]
There are other orthogonal wavelet bases than the Haar bases, many with nicer properties. All of them are similarly characterized by:
so that \(\{2^{-j/2} \psi(2^{-j}t-n), j,n\in\mathbb{Z}\}\) and \(\{2^{-J/2} \phi(2^{-J}t-n),2^{-j/2} \psi(2^{-j}t-n),j\leq J,n\in\mathbb{Z}\}\) are orthogornal bases, \[ f(t) = \sum_{n\in\mathbb{Z}} \sum_{j\in\mathbb{Z}} d_{j,n} 2^{-j/2} \psi(2^{-j}t-n) = \sum_{n\in\mathbb{Z}} a_{J,n} 2^{-J/2} \phi(2^{-J}t-n) + \sum_{n\in\mathbb{Z}} \sum_{j=-\infty}^{J} d_{j,n} 2^{-j/2} \psi(2^{-j}t-n), \] where \[ d_{j,n} = \int f(t) 2^{-j/2} \psi(2^{-j}t-n) dt,\quad a_{J,n} = \int f(t) 2^{-J/2} \phi(2^{-J}t-n) dt. \] The coefficients \(d_{j,n}\) are called detail (wavelet) coefficients at scale \(j\) (rather \(2^j\)) and the coefficients \(a_{J,n}\) are called approximation coefficients at scale \(j\) (rather \(2^j\)). A similar terminology applies to the detail and approximation terms in the expansion of \(f(t)\).
Moreover, there are sequences \(\{g_n\}\) and \(\{h_n\}\) such that \[ \frac{1}{\sqrt{2}} \psi(\frac{t}{2}) = \sum_n g_n \phi(t-n),\quad \frac{1}{\sqrt{2}} \phi(\frac{t}{2}) = \sum_n h_n \phi(t-n). \] The sequences \(\{g_n\}\) and \(\{h_n\}\) are known as conjugate mirror filters (CMFs), and also characterize an orthogonal wavelet basis. We also have \(g_n = (-1)^n \overline{h_{1-n}}\).
An important property of a wavelet is the number of zero moments, that is, \(Q\) such that \[ \int \psi(t) t^q dt = 0,\quad q=0,\ldots,Q-1. \] There are equivalent condition to express this in terms of CMFs.
Some known examples of orthogonal wavelet bases: Meyer; Daubechies; splines; …
Family of wavelets \(_{N}\psi\) with \(N\geq 1\) zero moments, and scaling function \(_{N}\phi\) such that \(\mbox{supp}\{_{N}\psi\} = [-N+1,N]\), \(\mbox{supp}\{_{N}\phi\} = [0,2N-1]\) (minimum sizes to have \(N\) zero moments). The forms of \(_{N}\psi\), \(_{N}\phi\) are NOT known explicitly, but rather characterized through numerical values of the corresponding CMF \(_{N}h_n\), with \(2N\) non-zero \(_{N}h_n\)’s. (The case \(N=1\) corresponds to Haar wavelets.)
These CMFs can be used to compute numerically the corresponding wavelets and scaling functions. Here are the plots of a few of them.
library(wavethresh)
N <- 3
filter.select(N, family="DaubExPhase")
## $H
## [1] 0.33267055 0.80689151 0.45987750 -0.13501102 -0.08544127 0.03522629
##
## $G
## NULL
##
## $name
## [1] "Daub cmpct on ext. phase N=3"
##
## $family
## [1] "DaubExPhase"
##
## $filter.number
## [1] 3
#support(filter.number=N, family="DaubExPhase")
draw(filter.number=N, family="DaubExPhase",scaling.function = TRUE,enhance = FALSE)
draw(filter.number=N, family="DaubExPhase",scaling.function = FALSE,enhance = FALSE)
One of the appealing features of orthogonal wavelets is the availability of fast algorithms that relate wavelet and detail coefficients across various scales. The algorithms are known as (fast) discrete wavelet transform (DWT).
At decomposition: \[ a_{j+1,p} = \sum_n h_{n-2p} a_{j,n} = (a_{j,\cdot} * h^{\vee})_{2p} \] or, in short, \[ a_{j+1} = \downarrow_2 (a_{j} * h^{\vee}), \] where \(*\) is the convolution operation, \(^\vee\) is the time reversion operation, and \(\downarrow_2\) is the downsampling by \(2\) operation (i.e. \((\downarrow_2 x)_n = x_{2n}\)). Similarly, \[ d_{j+1} = \downarrow_2 (a_{j} * g^{\vee}). \]
At reconstruction: \[ a_{j,p} = \sum_n h_{p-2n} a_{j+1,n} + \sum_n g_{p-2n} d_{j+1,n} = (h * \uparrow_2 a_{j+1,\cdot})_{p} + (g * \uparrow_2 d_{j+1,\cdot})_{p} \] or, in short, \[ a_{j} = h * \uparrow_2 a_{j+1} + g * \uparrow_2 d_{j+1}, \] where \(\uparrow_2\) is the upsampling by \(2\) operation (i.e. \((\uparrow_2 x)_n = x_p\) if \(n=2p\) and \(=0\) otherwise).
Decomposition and reconstruction steps as a perect reconstruction filter bank
Approach 1: If data \(x_n\), \(n=1,\ldots,2^p\), is available, set \(a_{0,n} = x_n\), , \(n=1,\ldots,J\), from which the following quantities can be computed, after taking care of “boundary effects” in some way: \[ \begin{array}{ccl} d_{1,n} & : & n = 1,\ldots,2^{p-1}\\ d_{2,n} & : & n = 1,\ldots,2^{p-2}\\ \ldots & & \ldots \\ d_{p,n} & : & n = 1 (=2^{p-p}) \\ a_{p,n}& : & n = 1 \end{array} \] So that \(2^p\) points are transformed into \(2^p\) points through a linear, in fact orthogonal, transformation. This is what is usually plotted as the coefficients of DWT. In fact, the whole theory can be built just for discrete sequences using CMFs. For example, this is the approach taken by Percival and Walden (2000). Computational complexity (!): only \(O(2^p)\).
Approach 2: If the signal is thought as \(f(2^J n)\) for large negative \(J\) and smooth \(f\), note that (assuming \(\int \phi(u)du = 1\)) \[ f(2^J n) \approx \int_{\mathbb R} f(t) \frac{1}{2^J}\phi\Big( \frac{t-2^{J}n}{2^J} \Big) dt = 2^{-J/2} a_{J,n,} \] so that this is what is taken for the initialization of DWT. This is sometimes referred to as a “wavelet crime” in the wavelet literature.
Other approaches are taken as well, especially related to Approach 2.
Example 1: Signal with point singularity:
test.data <- example.1()$y
length(test.data)
## [1] 512
log2(length(test.data))
## [1] 9
plot(test.data,type="l")
wds <- wd(test.data,filter.number=10,family="DaubLeAsymm",type="wavelet",bc="periodic")
plot(wds)
## [1] 6.287696 6.287696 6.287696 6.287696 6.287696 6.287696 6.287696 6.287696
## [9] 6.287696
wds2 <- wd(test.data,filter.number=10,family="DaubLeAsymm",type="station",bc="periodic")
plot(wds2)
## [1] 6.489179 6.489179 6.489179 6.489179 6.489179 6.489179 6.489179 6.489179
## [9] 6.489179
library(Rwave)
isize <- length(test.data)
noctave <- 5
nvoice <- 10
pp <- noctave * nvoice
cwt2 <- cwt(test.data, noctave, nvoice, plot=FALSE)
image(1:isize, seq(0, pp, length = pp), Mod(cwt2), xlab = "Time", ylab = "log(scale)",main = "Wavelet Transform Modulus",col = grey(seq(0, 1, length = 256)))
Example 2: Chirp:
test.chirp <- simchirp()$y
length(test.chirp)
## [1] 1024
log2(length(test.chirp))
## [1] 10
plot(test.chirp, main="Simulated chirp signal",type="l")
chirpwds <- wd(test.chirp,filter.number=8,family="DaubLeAsymm",type="wavelet",bc="periodic")
plot(chirpwds)
## [1] 9.521105 9.521105 9.521105 9.521105 9.521105 9.521105 9.521105
## [8] 9.521105 9.521105 9.521105
chirpwds2 <- wd(test.chirp,filter.number=8,family="DaubLeAsymm",type="station",bc="periodic")
plot(chirpwds2)
## [1] 12.05649 12.05649 12.05649 12.05649 12.05649 12.05649 12.05649
## [8] 12.05649 12.05649 12.05649
isize <- length(test.chirp)
noctave <- 5
nvoice <- 10
pp <- noctave * nvoice
chirpcwt <- cwt(test.chirp, noctave, nvoice, plot=FALSE)
image(1:isize, seq(0, pp, length = pp), Mod(chirpcwt), xlab = "Time", ylab = "log(scale)",main = "Wavelet Transform Modulus",col = grey(seq(0, 1, length = 256)))
The setting is \[ y_t = g\Big( \frac{t}{T} \Big) + \epsilon_t,\quad t=1,\ldots,T, \] where \(\{\epsilon_t\} \sim \mbox{IID}(0,\sigma^2)\), normally distributed (though these can be relaxed with suitable modifications).
Compute DWT of the data, “shrink” the wavelet coefficients and take the inverse DWT for an estimate of \(g\). The hard and soft thresholding rules are based on: \[ \eta_H(d,\lambda) = d 1_{\{ |d|>\lambda\} },\quad \eta_S(d,\lambda) = \mbox{sign}(d) (|d|-\lambda) 1_{\{ |d|>\lambda\} }, \] where \(\lambda\) is a threshold.
Universal thresholding: \(\lambda^u = \sigma \sqrt{2\log T}\) (in practice, \(\sigma\) is estimated); SURE (Stein unbiased risk estimator), Cross validation: see Nason (2008), pp. 96-99.
See e.g. Nason (2008), Sections 6-7 for much more information (density estimation, etc).
# Nason (2008), pp. 90-95
v <- DJ.EX()
x <- (1:1024)/1024 # Define X coordinates too
plot(x, v$bumps, type="l", ylab="Bumps")
ssig <- sd(v$bumps) # Bumps sd
SNR <- 2 # Fix our SNR
sigma <- ssig/SNR
e <- rnorm(1024, mean=0, sd=sigma)
y <- v$bumps + e
plot(x, y, type="l", ylab="Noisy bumps")
#
# Plot wd of bumps
#
xlv <- seq(from=0, to=1.0, by=0.2)
bumpswd <- wd(v$bumps)
plot(bumpswd, main="", sub="", xlabvals=xlv*512, xlabchars=as.character(xlv),xlab="x")
## [1] 73.4135 73.4135 73.4135 73.4135 73.4135 73.4135 73.4135 73.4135
## [9] 73.4135 73.4135
#
# Plot wd of noisy bumps for comparison
#
ywd <- wd(y)
plot(ywd, main="", sub="",xlabvals=xlv*512, xlabchars=as.character(xlv),xlab="x")
## [1] 75.66155 75.66155 75.66155 75.66155 75.66155 75.66155 75.66155
## [8] 75.66155 75.66155 75.66155
FineCoefs <- accessD(ywd, lev=nlevelsWT(ywd)-1)
sigma <- mad(FineCoefs)
utDJ <- sigma*sqrt(2*log(1024))
ywdT <- threshold(ywd, policy="manual", value=utDJ)
ywr <- wr(ywdT)
plot(x, ywr, type="l")
lines(x, v$bumps, lty=2)
ywdcvT <- threshold(ywd, policy="cv", dev=madmad)
ywrcv <- wr(ywdcvT)
plot(x, ywrcv, type="l", xlab="x", ylab="Cross-val.Estimate")
lines(x, v$bumps, lty=2)
The same principle can be used in e.g. image compression:
data(lennon)
# 256 x 256 image
image(lennon,col = grey(seq(0, 1, length = 256)))
lwd <- imwd(lennon, filter.number=6)
length(lwd$w7L2[lwd$w7L2!=0])
## [1] 16384
plot(lwd,col = grey(seq(0, 1, length = 256)))
lwdT <- threshold.imwd(lwd)
length(lwdT$w7L2[lwdT$w7L2!=0])
## [1] 3
plot(lwdT,col = grey(seq(0, 1, length = 256)))
lennon2 <- imwr(lwdT,filter.number=6)
image(lennon2,col = grey(seq(0, 1, length = 256)))
Recall that \(f(t)\) is \(h\)-Holder continuous if \[ |f(t) - f(u)| \simeq |t-u|^h. \] This implies that (as \(j\to-\infty\) or at small scales) \[ |2^{-j/2} d_{j,k} | \simeq 2^{jh}. \] (Why?)
Focus on the interval \(t,u\in[0,1]\) and allow \(h\) to vary with time, i.e. \(h=h(t)\). Let \[ N^j(h) = \# \{k = 1,\ldots,2^{-j}: |2^{-j/2} d_{j,k} | \simeq 2^{jh}\} \] Suppose that (as \(j\to-\infty\) or at small scales) \[ N^j(h) \simeq 2^{-jD(h)}. \] For fixed \(h\), the value \(D(h)\) is related to various fractal dimensions of the sets \(N^j(h)\) and the plot of \(D(h)\) as a function of \(h\) as multifractal spectrum of the signal \(f(t)\).
Now define the so-called structure function \(S_j(q)\) for real \(q\) (usually positive) as \[ S_j(q) = \sum_{k=1}^{2^{-j}} |2^{-j/2} d_{j,k} |^q. \] Then (as \(j\to-\infty\) or at small scales) \[ S_j(q) \simeq 2^{j \tau(q)} \] where \[ \tau(q) = \inf_h \{qh - D(h)\} =: D^*(q). \]
(Why?) The transformation \(D^*(q)\) is known as the Legendre transform of \(D\), and under mild assumptions \[ \tau^*(h) = (D^*)^*(h) = D(h). \] (In fact, under mild assumptions, one can make a converse argument.) In practice, \(\tau(q)\) is estimated from the structure function of the signal wavelet coefficients, and the numeric Legendre transform is computed to get \(D(h)\). The spectrum \(D(h)\) is then used as a measure of smoothness (roughness) of the signal.
For example, Abry et al. (2013) use these measures to identify forgeries/copies of paintings. For example, the following are two paintings of Charlotte Casper, first the original and then a copy produced several weeks later. Source: Machine Learning and Image Processing for Art Investigation Research Group.
Original: