Have you ever wondered as a kid as you watched someone use an Etch-a-Sketch, and were still adapting to using the knobs, how to draw a pen stroke curve? It turns out that you don’t really need to be an expert on the knobs, but with a lot of patience you can draw a pen stroke curve to any degree of accuracy just using the up-down, left-right in jagged steps, making shorter and shorter lines at each approximation to whatever line you’re trying to draw. Mathematically, we can say we are approximating a continuous curve on a finite interval with dyadic step functions. We will cover many topics, as I work my way through(doing exercises where needed) David Walnut’s book \textit{An Introduction to Wavelet Analysis} to try to understand uniform convergence of dyadic step functions to continuous functions on finite closed intervals, Generalized Fourier Series and orthonormal bases, and most importantly that the Haar series converges in L^2 and interestingly enough also uniformly, the intricate proof of which is at the end. I use the Theorems and Definitions directly from the book, and maybe add in a little extra detail or two that helped me better understand the proofs or exercises. To begin we go over the Heine-Cantor Theorem, that a continuous function on a compact set is uniformly continuous, the proof uses metric spaces, but less generally, the result can be used to show that a real continuous function defined on a closed and bounded interval [a,b], a<b, with a,b\in\mathbb{R}  is also uniformly continuous.

an Etch-a-Sketch animation with Haar scaling function, Haar wavelets, and approximating a continuous curve
Brendon James Thomson ©

\textbf{Heine-Cantor theorem.} If f:\langle X,\rho\rangle\rightarrow \langle Y,\sigma \rangle is continuous, where \langle X,\rho \rangle and \langle Y,\sigma \rangle are metric spaces, and \langle X, \rho \rangle is compact, then f is uniformly continuous, that is, for every \epsilon > 0 there exists a \delta > 0 such that for all x,y \in X with \rho (x,y)<\delta, \sigma(f(x),f(y)) < \epsilon.

The proof of the theorem can be found on Wikipedia and I’ll rewrite it here in my own words.

\textit{Proof.} The proof takes full advantage of the definition of compact in a metric space, that is if every open covering \mathcal{U} of X has a finite subcovering, i.e., there is a finite collection \{\mathcal{O}_1,\mathcal{O}_2,...,\mathcal{O}_n\} \subset \mathcal{U} such that X = \bigcup\limits_{i=1}^n\mathcal{O}_i.

First, by definition of continuous, f is continuous at each point x\in X, i.e., for every x\in X, for every \epsilon >0 there is a \delta >0 such that if \rho(x,y) < \delta_x, then \sigma(f(x),f(y)) < \epsilon. Now letting \epsilon > 0 be arbitrary, since f is continuous at every x\in X we can find a \delta_x (note here that the \delta is indexed by x to denote that it is specific to that x) such that if y\in X is at most \delta_x apart from x, \rho(x,y) < \delta_x, then \sigma(f(x),f(y)) < \frac{\epsilon}{2}.

Note that B_{x,\frac{\delta_x}{2}} = \{y : \rho(x,y) < \frac{\delta_x}{2}\} is the ball centered at x of radius \delta_x/2 and in particular these balls form an open covering for X\bigcup\limits_{x\in X} B_{x,\frac{\delta_x}{2}} = X, because x \in B_{x,\frac{\delta_x}{2}} for every x \in X by the definition of metric, \rho(x,x) = 0 < \frac{\delta_x}{2}. Since X is compact, there exists a finite subcovering such that X = \bigcup\limits_{i=1}^n B_{x_i,\frac{\delta_{x_i}}{2}}.

Now we can pick the smallest \frac{\delta_{x_i}}{2} because there are only finitely many of them, and denote it by \boldsymbol{\delta} = \text{min}\left\{\frac{\delta_{x_1}}{2},..., \frac{\delta_{x_n}}{2}\right\}, Wikipedia asserts that \boldsymbol{\delta} >0, but I think that should be obvious, you can’t pick out a negative number or zero out of a finite set of positive numbers. The emboldened \boldsymbol{\delta} will be the \delta used in the definition of uniform convergence. It is important to note before proceeding that by the definition of a metric, \rho(x,y) \leq \rho(x,z) + \rho(z,y) for all x,y,z \in X, otherwise known as the triangle inequality. I used \rho in this case but the same applies for \sigma because these properties hold for arbitrary metrics.

Given any x,y \in X such that \rho(x,y) < \boldsymbol{\delta}, since X = \bigcup\limits_{i=1}^n B_{x_i,\frac{\delta_{x_i}}{2}}, x must be an element of some B_{x_i,\frac{\delta_{x_i}}{2} for some 1\leq i \leq n. Thus \rho(x_i,x) < \frac{\delta_{x_i}}{2}, and making x our center point between x_i and y, by the triangle inequality:

\rho(x_i,y) \leq \rho(x_i,x)+\rho(x,y) < \frac{\delta_{x_i}}{2} + \boldsymbol{\delta} \leq \frac{\delta_{x_i}}{2} + \frac{\delta_{x_i}}{2} = \delta_{x_i}

where the last inequality follows by the definition of \boldsymbol{\delta}. Note that both \rho(x_i,x),\rho(x_i,y)<\delta_{x_i}, and by the definition of continuity and the way things were set up above for the deltas indexed by x, \sigma(f(x_i),f(x)), \sigma(f(x_i),f(y)) < \frac{\epsilon}{2}. Now with x_i in the role of the center point, a second use of the triangle inequality gives

\epsilon=\frac{\epsilon}{2}+\frac{\epsilon}{2}>\sigma(f(x),f(x_i))+\sigma(f(x_i),f(y)) \geq \sigma(f(x),f(y)). \blacksquare

In particular, thanks to the Heine-Cantor theorem we know that a continuous real function on a closed and bounded interval [a,b] is uniformly continuous. Thus, if f: [0,1] \rightarrow \mathbb{R} is continuous, then f is uniformly continuous by Heine-Cantor. This fact is particularly useful when trying to prove Lemma 5.23 in David Walnut’s Book \textit{An Introduction to Wavelet Analysis} which is left as an exercise.

I will use/copy the definitions presented in the book for the infinity norm, or L^\infty norm, and the L^2 norm, here for reference and so that the uninitiated reader is not completely lost, these are the usual definitions.

\textbf{Definition 1.1.} A piecewise continuous function f(x) defined on an interval I is bounded (or L^\infty) on I if there is a number M>0 such that |f(x)|\leq M for all x\in I
The L^\infty-norm of a function f(x) is defined by 

    \[||f||_\infty = \text{sup}\{|f(x)|:x\in I\}.\]

\textbf{Definition 1.6.} A piecewise continuous function f(x) defined on an interval I is square-integrable (or of class L^2 or simply L^2) on I if the integral

    \[\int_{I} |f(x)|^2 dx\]

is finite (<\infty)
The L^2-norm of a function f(x) is defined by

    \[||f||_2 = \left(\int_{I} |f(x)|^2 dx \right)^{\frac{1}{2}}.\]

I also need to show what a dyadic interval is, all they are is the partitioning of the real line into intervals of length a power of two, here’s the definition directly from the book:

\textbf{Definition 5.1.} For each pair of integers j,k \in \mathbb{Z}, define the interval I_{j,k} by

    \[I_{j,k} = [2^{-j}k,2^{-j}(k+1)).\]

The collection of all such intervals is called the collection of dyadic subintervals of \mathbb{R}.

\textbf{Lemma 5.2.} Given j_0,k_0,j_1,k_1 \in \mathbb{Z}, with either j_0 \not= j_1 or k_0 \not= k_1, then either

(a) I_{j_1,k_1} \cap I_{j_0,k_0} = \varnothing,

(b) I_{j_1,k_1} \subseteq I_{j_0,k_0}, or I_{j_0,k_0}\subseteq I_{j_1,k_1}.

In the latter two cases, the smaller interval is contained in either the right half or left half of the larger.

\textit{Proof.} This is left as an exercise but the result should be intuitive. Given two intervals, one is the smaller one, and you can slide the smallest interval up and down the real line and it will fit snuggly inside the target zone of the larger interval if it so lands there. Either the smaller interval is outside the larger interval, they are disjoint, or it is contained inside the larger one. Since every finer partition divides every interval in two, and then the two halves in two again, and so on and so on, if a smaller interval is contained in a larger one it must be in one of the two halves, the right or the left. \blacksquare

\textbf{Definition 5.3.} Given a dyadic interval at scale j, I_{j,k}, we write I_{j,k}=I_{j,k}^\ell \cup I_{j,k}^r, where I_{j,k}^\ell and I_{j,k}^r are dyadic intervals at scale j+1, to denote the left half and right half of the interval I_{j,k}. In fact, I_{j,k}^\ell = I_{j+1,2k} and I_{j,k}^r = I_{j+1,2k+1}.

\textit{Quick check.} I_{j,k} has midpoint \frac{2k+1}{2^{j+1}}, and the end points can be written as \frac{2k}{2^{j+1}}, \frac{2k+2}{2^{j+1}} and

    \[I_{j,k}=\left[\frac{k}{2^j},\frac{k+1}{2^j} \right) = \left[ \frac{2k}{2^{j+1}}, \frac{2k+1}{2^{j+1}}\right) \cup \left[ \frac{2k+1}{2^{j+1}}, \frac{2k+2}{2^{j+1}}\right) = {I_{j+1,2k}} \cup {I_{j+1,2k+1}} = {I_{j,k}^\ell} \cup {I_{j,k}^r}. \text{ }\blacksquare\]

I could also formally present the definition of a scale j dyadic step function, as you can imagine it is a step function that is constant on the dyadic intervals:

\textbf{Definition 5.4.} A dyadic step function is a step function f(x) with the property that for some j \in \mathbb{Z}, f(x) is constant on all dyadic intervals I_{j,k}, k \in \mathbb{Z}. We say in this case that f(x) is a scale j dyadic step function.
For any interval I, a dyadic step function on I is a dyadic step function that is supported on I.

We can now proceed to proving the lemma.

\textbf{Lemma 5.23.} Given f(x) continuous on [0,1], and \epsilon > 0, there is a J\in \mathbb{Z}, and a scale J dyadic step function g(x) supported in [0,1]  such that |f(x)-g(x)| < \epsilon, for all x \in [0,1]; that is, ||f-g||_\infty < \epsilon.

\textit{Proof.} Since f: [0,1] \rightarrow \mathbb{R} is continuous, by Heine-Cantor, f is uniformly continuous, and so for all \epsilon >0 there is a \delta>0 such that for all x,y \in [0,1], if |x-y|<\delta, then |f(x)-f(y)|<\epsilon. Let J be such that \frac{1}{2^{J}}<\delta. If x and y are in the dyadic interval I_{J,k}, then |x-y| \leq \frac{1}{2^{J}}. Given a scale J dyadic step function g(x) supported on [0,1], that is g(x) is constant on the intervals I_{J,k} for 0 \leq k \leq 2^J -1, fix any point x_0 in the dyadic interval I_{J,k}, define g(x) = f(x_0) for all x \in I_{J,k}. By the uniform continuity of f, what happens on one dyadic interval happens for the rest. By above (that f is uniformly continuous), for every \epsilon >0, there is a \delta >0 and a J \in \mathbb{Z} such that \delta> \frac{1}{2^{J}} > 0 where for all x \in I_{J,k}, |x-x_0| \leq \frac{1}{2^{J}} < \delta implies |f(x) -f(x_0)| =|f(x) - g(x)|< \epsilon, for 0 \leq k \leq 2^{J}-1. \blacksquare

\textit{Second Proof.} Since f: [0,1] \rightarrow \mathbb{R} is continuous, by Heine-Cantor, f is uniformly continuous, and so for all \epsilon >0 there is a \delta>0 such that for all x,y \in [0,1], if |x-y|<\delta, then |f(x)-f(y)|<\epsilon. Let J be such that \frac{1}{2^{J+1}}<\delta and fix y be the midpoint m_{J,k} = \frac{2k+1}{2^{J+1}} of the dyadic interval I_{J,k}. If x is in the dyadic interval I_{J,k}, then x satisfies |x-m_{J,k}| \leq \frac{1}{2^{J+1}}. Given a scale J dyadic step function g(x) supported on [0,1], that is g(x) is constant on the intervals I_{J,k} for 0 \leq k \leq 2^J -1, define g(x) = f(m_{J,k}) for all x \in I_{J,k}. By the uniform continuity of f, what happens on one dyadic interval happens on the rest. By above (that f \textit{is} uniformly continuous), for every \epsilon >0, there is a J \in \mathbb{Z} such that \delta> \frac{1}{2^{J+1}} > 0 such that for all x \in I_{J,k}, |x-m_{J,k}| \leq \frac{1}{2^{J+1}} < \delta implies |f(x) -f(m_{J,k})| =|f(x) - g(x)|< \epsilon, for 0 \leq k \leq 2^{J}-1. \blacksquare

Less formally, we can always find dyadic intervals of length less than \delta, and given the uniform continuity of f, all points x,y inside these intervals satisfy |x-y|<\delta which implies |f(x)-f(y)|<\epsilon. Fixing any point x_0 in the dyadic intervals I_{J,k}, and letting g (our dyadic step function supported on [0,1]) be equal to f(x_0) on each respective interval, since |x-x_0| \leq \frac{1}{2^J} < \delta for all x\in I_{J,k}, |f(x)-f(x_0)|<\epsilon but g(x) = f(x_0) for all x \in I_{J,k} thus |f(x)-g(x)|=|f(x)-f(x_0)|<\epsilon.

Thus it is possible to approximate a continuous function on [0,1] to any degree of accuracy with dyadic step functions supported on [0,1]. In fact, this is true for any closed and bounded interval [a,b] \subset \mathbb{R}, but [0,1] will have some convenient properties as I will show shortly. We will develop a collection of special functions with special properties, but first the concept of orthogonality and correspondingly, orthonormality, must be defined. As stated in Walnut’s book:

\textbf{Definition 2.45.} A collection of functions \{g_n(x)\}_{n\in \mathbb{N}}, L^2 on an interval I is a (general) orthogonal system on I provided that 

(a) \int_{I} g_n(x)\overline{g_m(x)} dx =0\text{ if } n\not= m, \text{ and}

(b) \int_{I} g_n(x)\overline{g_n(x)}dx = \int_{I} |g_n(x)|^2 dx >0.

The collection \{g_n(x)\}_{n\in \mathbb{N}} is a (general) orthonormal system on I provided that it is an orthogonal system on I and

(b') \int_{I} g_n(x)\overline{g_n(x)}dx = \int_{I} |g_n(x)|^2 dx = 1.

I will only consider real valued functions, thus there is no need for the \overline{g_i(x)}=g_i(x) notation denoting the complex conjugate. Part (c) of Remark 2.46 also states:

… we will use the inner product notation to represent the integrals in Definition 2.45. That is, we write for any functions f(x), and g(x) L^2 on I,

    \[\langle f,g \rangle = \int_{I} f(x)\overline{g(x)}dx.\]

This means in particular that

    \[\langle f,f \rangle = \int_{I} f(x)\overline{f(x)}dx = \int_{I} |f(x)|^2 dx = ||f||_2^2.\]

I will make use of the same convenient notation as well. The question now arises about Generalized Fourier Series, which section 2.3.2 treats splendidly:

\textbf{Definition 2.48.} Given a function f(x), L^2 on an interval I, and an orthonormal system \{g_n(x)\} on I, the (generalized) Fourier coefficients, \{c(n)\} of f(x) with respect to \{g_n(x)\} are defined by

    \[c(n)=\int_{I} f(x)\overline{g(x)}dx = \langle f,g_n \rangle.\]

The (generalized) Fourier series of f(x) with respect to \{g_n(x)\} is

    \[f(x) \sim \sum\limits_{n\in \mathbb{N}} \langle f,g_n \rangle g_n(x).\]

and then goes on to determine when the \sim can be replaced with =. First two useful Lemmas are proved

\textbf{Lemma 2.50.} Let \{g_n(x)\} be an orthonormal system on an interval I. then for every f(x), L^2 on I, and every N\in \mathbb{N},

    \[\left|\left| f-\sum\limits_{n=1}^N \langle f,g_n \rangle g_n \right|\right|_2^2 = ||f||_2^2-\sum\limits_{n=1}^N |\langle f,g_n\rangle |^2.\]

\textit{Proof.} it is stated as well in the text, but, the proof takes full advantage of the orthonormality of \{g_n(x)\}

    \[\int_{I} \left|f(x) -\sum\limits_{n=1}^N \langle f,g_n\rangle g_n(x) \right|^2 dx = \int_{I} \left(f(x) -\sum\limits_{n=1}^N \langle f,g_n\rangle g_n(x) \right)\overline{\left(f(x) -\sum\limits_{n=1}^N \langle f,g_n\rangle g_n(x) \right)} dx\]

    \[= \int_{I} f(x)\overline{f(x)} dx - \sum_{n=1}^N \langle f,g_n\rangle \int_{I} \overline{f(x)}g_n(x) dx - \sum\limits_{n=1}^N \overline{\langle f,g_n \rangle} \int_{I}f(x)\overline{g_n(x)} dx + \sum\limits_{n=1}^N \sum\limits_{m=1}^N \langle f,g_n \rangle \overline{\langle f, g_m \rangle} \int_{I} g_n(x)\overline{g_m(x)}dx\]

but by the orthonormality of \{g_n(x)\}, \int_{I} g_n(x)\overline{g_m(x)} dx = 0 if n \not= m, and =1 if n=m, thus

\sum\limits_{n=1}^N \sum\limits_{m=1}^N \langle f,g_n \rangle \overline{\langle f, g_m \rangle} \int_{I} g_n(x)\overline{g_m(x)}dx = \sum\limits_{n=1}^N \langle f,g_n\rangle \overline{\langle f,g_n \rangle} and since we are only dealing with real valued functions,

\int_{I} \overline{f(x)}g_n(x) dx = \int_{I} f(x)\overline{g_n(x)}dx = \int_{I} f(x)g_n(x) dx = \langle f,g_n \rangle, thus we have

    \[\int_{I} |f(x)|^2 dx - \sum_{n=1}^N |\langle f,g_n\rangle|^2 - \sum\limits_{n=1}^N |\langle f,g_n \rangle}|^2 + \sum\limits_{n=1}^N |\langle f,g_n \rangle}|^2\]

and the desired result follows,

    \[\int_{I} |f(x)|^2 dx - \sum_{n=1}^N |\langle f,g_n\rangle|^2. \text{ }\blacksquare\]

\textbf{Lemma 2.51.} Let \{g_n(x)\} be an orthonormal system on I. Then for every f(x), L^2 on I, and every finite sequence of numbers \{a(n)\}_{n=1}^N,

    \[\left|\left| f-\sum\limits_{n=1}^N a(n)g_n \right|\right|_2^2 = \left|\left|f-\sum\limits_{n=1}^N\langle f,g_n\rangle g_n \right|\right|_2^2 + \sum\limits_{n=1}^N|a(n)-\langle f,g_n\rangle|^2.\]

\textit{Proof.} Let f(x) be given, and let \{g_n(x)\} be an orthonormal system. Then

    \[\left|\left|f-\sum\limits_{n=1}^N a(n)g_n \right|\right|_2^2 = \int_{I} \left(f(x)-\sum\limits_{n=1}^N a(n)g_n(x)\right)\left(f(x)-\sum\limits_{n=1}^N a(n)g_n(x)\right) dx\]

    \[= \int_{I} |f(x)|^2 dx -2\sum\limits_{n=1}^N a(n)\int_{I} f(x)g_n(x) dx +\sum\limits_{n=1}^N\sum\limits_{m=1}^Na(n)a(m)\int_{I} g_n(x)g_m(x)dx\]

again as in the previous lemma, by the orthonormality of \{g_n(x)\}

    \[= \int_{I} |f(x)|^2 dx -2\sum\limits_{n=1}^N a(n)\langle f,g_n\rangle +\sum\limits_{n=1}^N |a(n)|^2\]

and adding and subtracting \sum\limits_{n=1}^N |\langle f,g_n\rangle|^2

    \[= \int_{I} |f(x)|^2 dx +\sum\limits_{n=1}^N |\langle f,g_n \rangle |^2 -2\sum\limits_{n=1}^N a(n)\langle f,g_n\rangle +\sum\limits_{n=1}^N |a(n)|^2 -\sum\limits_{n=1}^N |\langle f,g_n \rangle |^2\]

    \[= \int_{I} |f(x)|^2 dx -\sum\limits_{n=1}^N |\langle f,g_n \rangle |^2 + \sum\limits_{n=1}^N |a(n)-\langle f,g_n \rangle |^2 = \left|\left|f-\sum\limits_{n=1}^N \langle f,g_n \rangle g_n \right|\right|_2^2 + \sum\limits_{n=1}^N |a(n)-\langle f,g_n \rangle |^2\]

by the previous lemma. \blacksquare

The two lemmas, especially 2.51, will now come in handy to show that if a generalized Fourier Series does converge to it’s infinite sum, then the generalized Fourier coefficients are precisely of the form c(n) = \langle f,g_n\rangle.

\textbf{Definition 2.52.} Given a collection of functions \{g_n(x)\}, L^2 on an interval I, the span of \{g_n(x)\}, denoted \text{span}\{g_n(x)\}, is the collection of all finite linear combinations of the elements of \{g_n(x)\}. In other words, f(x) \in \text{span}\{g_n(x)\} if and only if f(x) = \sum\limits_{n=1}^N a(n)g_n(x) for some finite sequence \{a(n)\}_{n=1}^N.

Where the related notion of mean-square (or L^2) closure of \text{span}\{g_n(x)\}, denoted \overline{\text{span}}\{g_n(x)\} is defined as, f(x) \in \overline{\text{span}}\{g_n(x)\} if for every \epsilon >0, there is a function g(x) \in \text{span}\{g_n(x) \} such that ||f-g||_2 <\epsilon.

\textbf{Definition 2.56.} Let \{g_n(x)\} be an orthonormal system on I. Then \{g_n(x)\} is complete on I provided that every function f(x), L^2 on I, is in \overline{\text{span}}\{g_n(x)\}. A complete orthonormal system is called an orthonormal basis.

We will see that we can replace an arbitrary function with it’s Fourier series precisely when the collection of functions is complete. Theorem 2.57 contains several equivalent criteria to determine when an orthonormal system of functions is complete but I will only prove (a) \Leftrightarrow (b), that is, \{ g_n(x) \} is complete on I if and only if for every f(x), L^2 on I, f(x) = \sum\limits_{n\in \mathbb{N}} \langle f,g_n \rangle g_n(x) in L^2 on I, which follow by Theorem 2.55. (a) \Leftrightarrow (c) will also be useful later, that is, \{ g_n(x) \} is complete on I if and only if every function f(x), C_c^0 on I, is in \overline{\text{span}}\{g_n(x)\}.

\textbf{Theorem 2.55.} Let \{g_n(x)\} be an orthonormal system on on interval I. Then a function f(x), L^2 on I, is in \overline{\text{span}}\{g_n(x)\} if and only if

    \[f(x) = \sum\limits_{n\in\mathbb{N}} \langle f,g_n\rangle g_n(x),\]

in L^2 on I.

\textit{Proof.} (\Longleftarrow) The statement f(x) = \sum\limits_{n\in\mathbb{N}} \langle f,g_n\rangle g_n(x) in L^2 is equivalent to the statement of uniform convergence in L^2, for every \epsilon>0, there is an N \in \mathbb{N} such that \left|\left|f-\sum\limits_{n=1}^m\langle f,g_n\rangle g_n \right|\right|_2^2 < \epsilon for every m \geq N and x\in D_f, or more succinctly \lim\limits_{N\rightarrow \infty} \left|\left|f -\sum\limits_{n=1}^N \langle f,g_n\rangle g_n \right|\right|_2^2 =0, but it turns out the expanded definition is more useful using the

definition of mean-square closure of \text{span}\{g_n(x)\} since we need to find a linear combination

C(x) \in \text{span}\{g_n(x)\} such that ||f-C||_2 < \epsilon, for every \epsilon >0. That is exactly it, for every \epsilon >0 we can find an

N>0 such that \left|\left| f-\sum\limits_{n=1}^N\langle f,g_n\rangle g_n \right|\right|_2 < \epsilon and thus f(x) \in \overline{\text{span}} \{g_n(x)\}.

(\Longrightarrow) Supposing that f(x) \in \overline{\text{span}} \{g_n(x)\}, if \epsilon>0, then by definition, there is a finite sequence \{a(n)\}_{n=1}^{N_0} for some N_0 \in \mathbb{N}, such that

    \[\left|\left| f-\sum\limits_{n=1}^{N_0}\langle f,g_n\rangle g_n \right|\right|_2 <\epsilon.\]

Since |a(n)-\langle f,g_n \rangle |^2 \geq 0, \sum\limits_{n=1}^{N_0} |a(n)-\langle f,g_n \rangle |^2 \geq 0, therefore certainly

    \[\left|\left| f-\sum\limits_{n=1}^{N_0}\langle f,g_n\rangle g_n \right|\right|_2^2 \leq \left|\left| f-\sum\limits_{n=1}^{N_0}\langle f,g_n\rangle g_n \right|\right|_2^2 + \sum\limits_{n=1}^{N_0} |a(n)-\langle f,g_n \rangle |^2\]

but the right hand side of the inequality is precisely the result of Lemma 2.51, and thus

    \[\left|\left| f-\sum\limits_{n=1}^{N_0}\langle f,g_n\rangle g_n \right|\right|_2^2 \leq \left|\left| f-\sum\limits_{n=1}^{N_0}a(n) g_n \right|\right|_2^2 < \epsilon^2.\]

David Walnut now leaves it as an exercise to prove that the result holds for all N \geq N_0, this is not a difficult application of Lemma 2.50. To note that \left\{\left|\left|f-\sum\limits_{n=1}^N \langle f,g_n \rangle g_n \right|\right|_2\right\}_{N\in \mathbb{N}} is a decreasing sequence, remember by Lemma 2.50, \left|\left| f-\sum\limits_{n=1}^N \langle f,g_n \rangle g_n \right|\right|_2^2 = ||f||_2^2-\sum\limits_{n=1}^N |\langle f,g_n\rangle |^2, where ||f||_2^2 is fixed and |\langle f,g_n \rangle|^2 \geq 0, thus as N increases, we are subtracting by a nonnegative number each time, and so

    \[\left|\left| f-\sum\limits_{n=1}^{N+1} \langle f,g_n \rangle g_n \right|\right|_2^2 \leq \left|\left| f-\sum\limits_{n=1}^N \langle f,g_n \rangle g_n \right|\right|_2^2\]

from which it follows by taking the square root of both sides, since the arguments being squared are known to be positive and the square root function is an increasing function

    \[\left|\left| f-\sum\limits_{n=1}^{N+1} \langle f,g_n \rangle g_n \right|\right|_2 \leq \left|\left| f-\sum\limits_{n=1}^N \langle f,g_n \rangle g_n \right|\right|_2\]

and \left\{\left|\left|f-\sum\limits_{n=1}^N \langle f,g_n \rangle g_n \right|\right|_2\right\}_{N\in \mathbb{N}} is a decreasing sequence. We can now replace N_0 above with any N \geq N_0 and the result would still hold, and so f(x) = \sum\limits_{n\in\mathbb{N}} \langle f,g_n\rangle g_n(x). \blacksquare

By the definition of complete, \{g_n(x)\} is complete on I if every function f(x), L^2 on I, is in \overline{\text{span}}\{g_n(x)\}, by Theorem 2.55, a function f(x), L^2 on I, is in \overline{\text{span}}\{g_n(x)\} if and only if f(x) = \sum\limits_{n\in \mathbb{N}} \langle f,g_n\rangle g_n(x), in L^2 on I. \{g_n(x)\}
is complete if and only if for every function f(x), L^2 on I, f(x) = \sum\limits_{n\in \mathbb{N}} \langle f,g_n\rangle g_n(x).

The question is, is if we can find a collection of step-functions that is an orthonormal basis on [0,1]? As it will turn out, there will be many. Let’s begin by defining some functions

    \[\mathbf{1}(x)=\varphi (x) = \chi_{[0,1)}(x) =\begin{cases} 1 & 0\leq x <1, \\ 0 & \text{otherwise.} \end{cases}\]

    \[\psi (x) = \chi_{[0,1/2)}(x)-\chi_{[1/2,1)}(x) = \begin{cases} 1 & 0\leq x <\frac{1}{2}, \\-1 &\frac{1}{2} \leq x<1, \\ 0 & \text{otherwise.} \end{cases}\]

The dilation and translation operators can now be specified with their definitions:

\textbf{Definition 3.39.} Given a>0, the dilation operator, D_a, defined on functions f(x), L^1 or L^2 on \mathbb{R}, is given by

    \[D_a f(x) = a^{1/2}f(ax).\]

Given b\in \mathbb{R}, the translation operator, T_b, defined on functions f(x), L^1 or L^2 on \mathbb{R}, is given by

    \[T_b f(x) = f(x-b).\]

Given the definition of \psi, we can then consider all dilations of 2^j and translations by k of \psi, for j,k \in \mathbb{Z}, that is

    \[\psi_{j,k}(x) = 2^{j/2}\psi(2^{j}x-k) = D_{2^j}T_k\psi(x).\]

The collection \{\psi_{j,k}(x)\}_{j,k\in\mathbb{Z}} is known as the Haar system on \mathbb{R}. We will show the orthonormality of the system \{\mathbf{1}\}\cup\{\psi_{j,k}: j\geq 0\}_{k\in\mathbb{Z}}. The orthonormality of the Haar system is shown in Theorem 5.13

\textbf{Theorem 5.13.} The Haar system on \mathbb{R} is an orthonormal system on \mathbb{R}.

\textit{Proof.} If j\in\mathbb{Z} is fixed, then orthonormality can be shown in a given scale j. Since it is known by Lemma 5.2 that

    \[I_{j,k} \cap I_{j,k'} = \begin{cases} \varnothing & \text{if }k\not=k', \\ I_{j,k} & \text{if }k=k'. \end{cases}\]

If k\not= k', then \psi_{j,k}(x)\psi_{j,k'}(x) = 0 because the functions are supported on disjoint intervals, hence,

    \[\langle \psi_{j,k},\psi_{j,k'}\rangle = \int_\mathbb{R} \psi_{j,k}(x)\psi_{j,k'}(x)dx = 0.\]

If k=k', then

    \[\langle \psi_{j,k},\psi_{j,k'}\rangle = \int_{I_{j,k}} \psi_{j,k}(x)\psi_{j,k}(x)dx = \int_{I_{j,k}} |\psi_{j,k}(x)|^2dx = \left(2^{j/2}\right)^2\cdot\frac{1}{2^j} =1\]

To show orthonormality between scales, suppose that j,j' \in\mathbb{Z} where we can assume that j>j' with k,k' \in \mathbb{Z}. Again Lemma 5.2 comes in handy, there are three possibilities

(i) I_{j',k'} \cap I_{j,k} = \varnothing. As before

    \[\langle \psi_{j,k},\psi_{j',k'}\rangle = \int_\mathbb{R} \psi_{j,k}(x)\psi_{j',k'}(x)dx = 0\]

because the functions are supported on disjoint intervals.

(ii) I_{j,k} \subset I_{j',k'}^\ell . \psi_{j',k'}(x) is the constant 1 on I_{j',k'}^\ell, and so also on I_{j,k} \subset I_{j',k'}^\ell, thus

    \[\langle \psi_{j,k},\psi_{j',k'}\rangle = \int_{I_{j,k}} \psi_{j,k}(x)\psi_{j',k'}(x)dx = \int_{I_{j,k}} \psi_{j,k}(x)dx = 0.\]

(iii) I_{j,k} \subset I_{j',k'}^r . \psi_{j',k'}(x) is the constant -1 on I_{j',k'}^r and so also on I_{j,k} \subset I_{j',k'}^r, thus

    \[\langle \psi_{j,k},\psi_{j',k'}\rangle = \int_{I_{j,k}} \psi_{j,k}(x)\psi_{j',k'}(x)dx = -\int_{I_{j,k}} \psi_{j,k}(x)dx = 0. \text{ } \blacksquare\]

Theorem 5.13 does most of the heavy lifting to prove that the system \{\mathbf{1}\}\cup\{\psi_{j,k}: j\geq 0\}_{k\in\mathbb{Z}} is orthonormal, all that is left to check is that \langle \mathbf{1},\mathbf{1}\rangle =1 and \langle \mathbf{1},\psi_{j,k}\rangle =0, but these are easy to see

    \[\langle \mathbf{1},\mathbf{1}\rangle  = \int_{[0,1)} \mathbf{1}(x)\mathbf{1}(x) dx = \int_{[0,1)} 1 dx = 1\cdot 1 =1\]

and if I_{j,k}\not\subset [0,1), then \mathbf{1}(x)\psi_{j,k}(x) = 0 and clearly \langle \mathbf{1},\psi_{j,k}\rangle =0, but if I_{j,k}\subseteq [0,1), then

    \[\langle \mathbf{1},\psi_{j,k}\rangle = \int_{I_{j,k}} \mathbf{1}(x)\psi_{j,k}(x) dx = \int_{I_{j,k}} \psi_{j,k}(x) dx = 0\]

as well, and  \{\mathbf{1}\}\cup\{\psi_{j,k}: j\geq 0\}_{k\in\mathbb{Z}} is an orthonormal system.

Back to our special interval [0,1], we can always consider a subset of the latter collection, namely the Haar system on [0,1],  or \{\mathbf{1}\}\cup\{\psi_{j,k}: j\geq 0; 0 \leq k \leq 2^j -1\}, which we know must be orthonormal since it is a subset of \{\mathbf{1}\}\cup\{\psi_{j,k}: j\geq 0\}_{k\in\mathbb{Z}}. To show that the Haar system on [0,1] is complete the book makes use of the Splitting Lemma, which I quite like thus I will present it here

\textbf{Lemma 5.16.} (The Splitting Lemma) Let j\in \mathbb{Z}, and let g_j(x) be a scale j dyadic step function. Then g_j(x) can be written as g_j(x) = r_{j-1}(x) + g_{j-1}(x), where r_{j-1} has the form

    \[r_{j-1}(x) = \sum\limits_{k} a_{j-1}(k)\psi_{j-1,k}(x),\]

for some coefficients \{a_{j-1}(k)\}_{k\in\mathbb{Z}}, and g_{j-1}(x) is a scale j-1 dyadic step function.

\textit{Proof.} Since g_j(x) is a scale j dyadic step function, it is constant on the intervals I_{j,k}. Denote the value that g_j(x) take on the interval I_{j,k} by c_j(k). For each interval I_{j-1,k}, define the scale j-1 step function g_{j-1}(x) on I_{j-1,k} by

    \[g_{j-1}(x) = 2^{j-1}\int_{I_{j-1,k}}g_j(t)dt =2^{j-1}\left[\int_{I_{j,2k}}g_j(t)dt + \int_{I_{j,2k+1}}g_j(t)dt\right]\]

    \[= 2^{j-1}\cdot c_j(2k)\cdot 2^{-j}+2^{j-1}\cdot c_j(2k+1)\cdot 2^{-j}  =\frac{1}{2}\left(c_j(2k)+c_j(2k+1)\right).\]

and g_{j-1}(x) takes the average values of g_j(x) on the left and right halves of I_{j-1,k}. Letting r_{j-1}(x) = g_j(x) - g_{j-1}(x), since g_{j-1}(x) is a scale j dyadic step function, so is r_{j-1}(x). Fixing a dyadic interval I_{j-1,k} recall that |I_{j-1,k}| = 2^{-(j-1)}. Then

    \[\int_{I_{j-1,k}} r_{j-1}(x) dx = \int_{I_{j-1,k}} g_{j}(x) dx - \int_{I_{j-1,k}} g_{j-1}(x) dx\]

    \[= \int_{I_{j,2k}} g_j(x) dx + \int_{I_{j,2k+1}} g_j(x)dx - \int_{I_{j-1,k}} g_{j-1}(x) dx\]

    \[=2^{-j}c_j(2k) + 2^{-j}c_j(2k+1) - 2^{-(j-1)}\frac{1}{2}\left(c_j(2k)+c_j(2k+1)\right) = 0.\]

Since r_{j-1}(x) is a scale j dyadic step function, it must be constant on the intervals I_{j,k} and since it’s integral on I_{j-1,k} is 0, and the interval has left and right intervals I_{j,2k} and I_{j,2k+1}, it must be a multiple of \psi_{j-1,k}(x) on those intervals, and thus has the above form. \blacksquare

We can now prove that the Haar system on [0,1] is a complete orthonormal system. It is convenient to use the equivalent statements of Theorem 2.57, in this case (a) \Leftrightarrow (c), that is the orthonormal system is complete if and only if every f(x)\in C^0 on [0,1] is in \overline{\text{span}}\left\{\{\mathbf{1}\} \cup \{\psi_{j,k}: j\geq 0; 0\leq k \leq 2^j -1\}\right\}. Given \epsilon >0, and f(x) \in C^0_{[0,1]}, by Lemma 5.23 there exists j>0 and a scale j dyadic step function on [0,1], g_j(x) such that

    \[||f-g_j||_2 = \left(\int_0^1 |f(x)-g_j(x)|^2dx\right)^{1/2} \leq ||f-g_j ||_\infty <\epsilon\]

where the second to last inequality follows because |f(x)-g_j(x)| \leq ||f-g_j ||_\infty by definition of the infinity norm, thus

    \[\left(\int_0^1 |f(x)-g_j(x)|^2dx\right)^{1/2} \leq \left(\int_0^1 ||f-g_j ||_\infty^2dx\right)^{1/2} = ||f-g_j ||_\infty\int_0^1 dx = ||f-g_j ||_\infty.\]

All that is needed is to show that g_j(x) is in that span of the Haar system on [0,1]. The Splitting Lemma can be used to write g_j(x) as g_j(x) = r_{j-1}(x) + g_{j-1}(x), where r_{j-1}(x) has the form \sum\limits_k a_{j-1}(k)\psi_{j-1,k}(x), where now since g_{j-1}(x) is a scale j-1 dyadic step function, the process can be repeated, and so on, until we get g_j(x) = \sum\limits_{\ell = 0}^{j-1}r_\ell(x) + g_0(x) and where r_\ell (x) = \sum\limits_{k=0}^{2^\ell -1} a_{\ell}(k)\psi_{\ell,k}(x) for some constants a_\ell(k) and g_0(x) is a scale 0 dyadic step function on [0,1] and must be constant on [0,1] and is thus a multiple of \mathbf{1}. It follows that g_j(x) is in the span of the Haar system on [0,1] and ||f-g_j||_2 <\epsilon and so the Haar system on [0,1] is a complete orthonormal system. \blacksquare

By the completeness of the orthonormal system \{\mathbf{1}\} \cup \{\psi_{j,k}: j\geq 0; 0\leq k \leq 2^j -1\}, we know that every function f(x) that is L^2 on [0,1] can be replaced with its infinite generalized Fourier series

    \[f(x) = \langle f, \mathbf{1} \rangle \cdot\mathbf{1} + \sum_{j=0}^\infty \sum_{k=0}^{2^j -1}  \langle f, \psi_{j,k} \rangle\cdot \psi_{j,k}\]

in the L^2 sense on [0,1], which is a very cool concept if you really think about it because of the counterintuitive notion of being able to build up any continuous function on [0,1] with discontinuous step functions that are equally spaced above and below the x-axis, and if we need to build a function that is always above the x-axis, that is when the \mathbf{1} function comes to the rescue and bumps everything up!

In closing, one can show that the generalized Fourier series, or Haar series above is also uniformly convergent to f, in addition to being convergent in L^2 sense. Before proving this we’re going to need the Mean Value Theorem for integrals.

\textbf{Mean Value Theorem for Integrals.}  Let f:[a,b] \rightarrow \mathbb{R} be continuous, then there exists a c \in (a,b) such that

    \[f(c)(b-a)=\int_a^b f(t)dt.\]

\textit{Proof.} Consider F: [a,b]\rightarrow \mathbb{R} where F(x) = \int_a^x f(t)dt, then by the first part of the Fundamental Theorem of Calculus, F is continuous on [a,b] and differentiable on (a,b) and F'(x) = f(x) for all x\in (a,b), and correspondingly the Mean Value Theorem applies on F, and there must exist some c\in(a,b) such that

    \[F'(c) = \frac{F(b)-F(a)}{b-a}.\]

By the second part of the Fundamental Theorem of Calculus \int_a^b f(t)dt = F(b)-F(a), thus since F'(c) =f(c),

    \[f(c)= \frac{1}{b-a}\int_a^b f(t)dt. \text{ } \blacksquare\]

Lastly, some notation is needed, let’s denote the mean of a continuous function on [a,b] by \overline{f}_{[a,b]} =\frac{1}{b-a}\int_a^b f(t)dt. In other words the Mean Value Theorem for Integrals implies there is a c\in (a,b) such that f(c) = \overline{f}_{[a,b]}. Also, the \chi function is handy in condensing notation, where I is a set

    \[\chi_{I}(x) = \begin{cases} 1 & x\in I \\ 0 & \text{otherwise} \end{cases}\]

\textbf{Theorem.} If f:[0,1] \rightarrow \mathbb{R} is a continuous function, then the Haar series

    \[\langle f, \mathbf{1} \rangle \cdot\mathbf{1} + \sum_{j=0}^\infty \sum_{k=0}^{2^j -1}  \langle f, \psi_{j,k} \rangle\cdot \psi_{j,k}\]

is uniformly convergent to f.

\textit{Proof.} The argument will be by induction, first consider only \langle f, \mathbf{1} \rangle \cdot\mathbf{1}, then \langle f, \mathbf{1} \rangle = \int_0^1 f(t)dt = \frac{1}{1-0}\int_0^1 f(t)dt = \overline{f}_{[0,1]} and \mathbf{1} = \chi_{[0,1)}, and \langle f, \mathbf{1} \rangle \cdot\mathbf{1} = \overline{f}_{[0,1]}\cdot\chi_{[0,1)}. First note that

    \[\psi_{j,k} = 2^{j/2}\psi(2^j x-k) = \begin{cases} 2^{j/2} & \frac{k}{2^j} \leq x < \frac{2k+1}{2^{j+1}}, \\ -2^{j/2} & \frac{2k+1}{2^{j+1}}\leq x < \frac{k+1}{2^j}, \\ 0 & \text{otherwise.} \end{cases}\]

    \[\text{ or }\psi_{j,k} = 2^{j/2}\cdot\chi_{\left[\frac{k}{2^j},\frac{2k+1}{2^{j+1}}\right)} - 2^{j/2}\cdot\chi_{\left[\frac{2k+1}{2^{j+1}},\frac{k+1}{2^j}\right)} = 2^{j/2}\cdot\chi_{I_{j+1,2k}} - 2^{j/2}\cdot\chi_{I_{j+1,2k+1}}.\]

Now consider summing until j=0, then \psi_{0,0} = \chi_{[0,1/2)} -\chi_{[1/2,1)}, \langle f, \psi_{0,0} \rangle = \int_0^1 f(t)\psi_{0,0}(t)dt = \int_0^{1/2}f(t)dt-\int_{1/2}^{1}f(t)dt, and together with the first term we get

    \[\left(\int_0^1 f(t)dt\right)\chi_{[0,1)} + \left(\int_0^{1/2}f(t)dt-\int_{1/2}^{1}f(t)dt\right)\left(\chi_{[0,1/2)} -\chi_{[1/2,1)}\right)\]

    \[= \left(\int_0^1 f(t)dt\right)\left(\chi_{[0,1/2)}+\chi_{[1/2,1)}\right) + \left(\int_0^{1/2}f(t)dt-\int_{1/2}^{1}f(t)dt\right)\left(\chi_{[0,1/2)} -\chi_{[1/2,1)}\right)\]

    \[= 2\cdot\left(\int_0^{1/2}f(t)dt\right)\chi_{[0,1/2)} + 2\cdot \left(\int_{1/2}^{1}f(t)dt\right)\chi_{[1/2,1)}\]

    \[= \frac{1}{\frac{1}{2}-0}\cdot\left(\int_0^{1/2}f(t)dt\right)\chi_{[0,1/2)} + \frac{1}{1-\frac{1}{2}}\cdot \left(\int_{1/2}^{1}f(t)dt\right)\chi_{[1/2,1)}\]

    \[= \overline{f}_{[0,1/2]}\chi_{[0,1/2)} + \overline{f}_{[1/2,1]}\chi_{[1/2,1)}.\]

I will include summing up to j=1 to illustrate the subtlety of the argument, and it will strengthen the perception of the apparent pattern.

Since \psi_{1,0}= \sqrt{2}\left(\chi_{[0,1/4)}-\chi_{[1/4,1/2)}\right) and \psi_{1,1}= \sqrt{2}\left(\chi_{[1/2,3/4)}-\chi_{[3/4,1)}\right), \langle f,\psi_{1,0} \rangle = \sqrt{2}\left(\int_0^{1/4}f(t)dt - \int_{1/4}^{1/2} f(t)dt\right) \langle f,\psi_{1,1} \rangle = \sqrt{2}\left(\int_{1/2}^{3/4}f(t)dt - \int_{3/4}^{1} f(t)dt\right), thus

    \[2\cdot\left(\int_0^{1/2}f(t)dt\right)\chi_{[0,1/2)} + 2\cdot \left(\int_{1/2}^{1}f(t)dt\right)\chi_{[1/2,1)} + \langle f,\psi_{1,0} \rangle \cdot \psi_{1,0} +\langle f,\psi_{1,1} \rangle\cdot \psi_{1,1}\]

    \[=2\cdot\left(\int_0^{1/2}f(t)dt\right)\chi_{[0,1/2)} + 2\cdot \left(\int_{1/2}^{1}f(t)dt\right)\chi_{[1/2,1)}\]

    \[+2\left(\int_0^{1/4}f(t)dt - \int_{1/4}^{1/2} f(t)dt\right)\left(\chi_{[0,1/4)}-\chi_{[1/4,1/2)}\right)\]

    \[+2\left(\int_{1/2}^{3/4}f(t)dt - \int_{3/4}^{1} f(t)dt\right)\left(\chi_{[1/2,3/4)}-\chi_{[3/4,1)}\right)\]

    \[=4\left(\int_0^{1/4}f(t)dt\right)\chi_{[0,1/4)}+4\left(\int_{1/4}^{1/2} f(t)dt\right)\chi_{[1/4,1/2)}+4\left(\int_{1/2}^{3/4}f(t)dt\right)\chi_{[1/2,3/4)} +4\left(\int_{3/4}^{1} f(t)dt\right)\chi_{[3/4,1)}\]

    \[=\frac{1}{\frac{1}{4}-0}\left(\int_0^{1/4}f(t)dt\right)\chi_{[0,1/4)}+\frac{1}{\frac{1}{2}-\frac{1}{4}}\left(\int_{1/4}^{1/2} f(t)dt\right)\chi_{[1/4,1/2)}\]

    \[+\frac{1}{\frac{3}{4}-\frac{1}{2}}\left(\int_{1/2}^{3/4}f(t)dt\right)\chi_{[1/2,3/4)} +\frac{1}{1-\frac{3}{4}}\left(\int_{3/4}^{1} f(t)dt\right)\chi_{[3/4,1)}\]

    \[=\overline{f}_{[0,1/4]}\chi_{[0,1/4)}+\overline{f}_{[1/4,1/2]}\chi_{[1/4,1/2)}+\overline{f}_{[1/2,3/4]}\chi_{[1/2,3/4)}+\overline{f}_{[3/4,1]}\chi_{[3/4,1)}.\]

Assuming for \ell\geq 2 we have \langle f, \mathbf{1} \rangle \cdot\mathbf{1} + \sum_{j=0}^{\ell-1} \sum_{k=0}^{2^j -1}  \langle f, \psi_{j,k} \rangle\cdot \psi_{j,k}=\sum\limits_{k=0}^{2^\ell -1} 2^\ell \left(\int_{\frac{k}{2^\ell}}^\frac{k+1}{2^\ell}f\right)\chi_{\left[\frac{k}{2^\ell},\frac{k+1}{2^\ell}\right)} =\sum\limits_{k=0}^{2^\ell -1} \overline{f}_{\left[\frac{k}{2^\ell},\frac{k+1}{2^\ell}\right]} \chi_{\left[\frac{k}{2^\ell},\frac{k+1}{2^\ell}\right)}, then adding \sum_{k=0}^{2^\ell -1}  \langle f, \psi_{\ell,k} \rangle\cdot \psi_{\ell,k} to both sides yields

    \[\langle f, \mathbf{1} \rangle \cdot\mathbf{1} + \sum_{j=0}^{\ell-1} \sum_{k=0}^{2^j -1}  \langle f, \psi_{j,k} \rangle\cdot \psi_{j,k} +\sum_{k=0}^{2^\ell -1}  \langle f, \psi_{\ell,k} \rangle\cdot \psi_{\ell,k}=\langle f, \mathbf{1} \rangle \cdot\mathbf{1} + \sum_{j=0}^{\ell} \sum_{k=0}^{2^j -1}  \langle f, \psi_{j,k} \rangle\cdot \psi_{j,k}\]

    \[= \sum\limits_{k=0}^{2^\ell -1} 2^\ell \left(\int_{\frac{k}{2^\ell}}^\frac{k+1}{2^\ell}f\right)\chi_{\left[\frac{k}{2^\ell},\frac{k+1}{2^\ell}\right)}+\sum_{k=0}^{2^\ell -1}  \langle f, \psi_{\ell,k} \rangle\cdot \psi_{\ell,k},\]

and

    \[\langle f,\psi_{\ell,k} \rangle = \int_{I_{\ell,k}}2^{\ell/2}\psi(2^\ell x -k) f(t)dt = 2^{\ell/2}\left(\int_{\frac{k}{2^\ell}}^{\frac{2k+1}{2^{\ell+1}}}f(t)dt - \int_{\frac{2k+1}{2^{\ell+1}}}^{\frac{k+1}{2^{\ell}}}f(t)dt\right)\]

    \[\psi_{\ell,k}=2^{\ell/2}\left[ \chi_{\left[\frac{k}{2^\ell},\frac{2k+1}{2^{\ell+1}}\right)}-\chi_{\left[\frac{2k+1}{2^{\ell+1}},\frac{k+1}{2^\ell}\right)}\right],\]

thus

    \[\langle f,\psi_{\ell,k}\rangle \psi_{\ell,k} + 2^\ell\left(\int_{\frac{k}{2^\ell}}^\frac{k+1}{2^\ell}f(t)dt\right)\chi_{\left[\frac{k}{2^\ell},\frac{k+1}{2^\ell}\right)}\]

    \[= 2^\ell\left(\int_{\frac{k}{2^\ell}}^{\frac{2k+1}{2^{\ell+1}}}f(t)dt - \int_{\frac{2k+1}{2^{\ell+1}}}^{\frac{k+1}{2^{\ell}}}f(t)dt\right) \left[ \chi_{\left[\frac{k}{2^\ell},\frac{2k+1}{2^{\ell+1}}\right)}-\chi_{\left[\frac{2k+1}{2^{\ell+1}},\frac{k+1}{2^\ell}\right)}\right]\]

    \[+ 2^\ell\left(\int_{\frac{k}{2^\ell}}^{\frac{2k+1}{2^{\ell+1}}}f(t)dt + \int_{\frac{2k+1}{2^{\ell+1}}}^{\frac{k+1}{2^{\ell}}}f(t)dt\right) \left[ \chi_{\left[\frac{k}{2^\ell},\frac{2k+1}{2^{\ell+1}}\right)}+\chi_{\left[\frac{2k+1}{2^{\ell+1}},\frac{k+1}{2^\ell}\right)}\right]\]

    \[= 2^{\ell+1}\left(\int_{\frac{k}{2^\ell}}^{\frac{2k+1}{2^{\ell+1}}}f(t)dt\right) \chi_{\left[\frac{k}{2^\ell},\frac{2k+1}{2^{\ell+1}}\right)} +2^{\ell+1}\left(\int_{\frac{2k+1}{2^{\ell+1}}}^{\frac{k+1}{2^{\ell}}}f(t)dt\right)\chi_{\left[\frac{2k+1}{2^{\ell+1}},\frac{k+1}{2^\ell}\right),\]

and so the result holds for \ell +1

    \[\sum_{k=0}^{2^\ell -1} 2^{\ell+1}\left(\int_{\frac{k}{2^\ell}}^{\frac{2k+1}{2^{\ell+1}}}f(t)dt\right) \chi_{\left[\frac{k}{2^\ell},\frac{2k+1}{2^{\ell+1}}\right)} +2^{\ell+1}\left(\int_{\frac{2k+1}{2^{\ell+1}}}^{\frac{k+1}{2^{\ell}}}f(t)dt\right)\chi_{\left[\frac{2k+1}{2^{\ell+1}},\frac{k+1}{2^\ell}\right)\]

    \[=\sum\limits_{k=0}^{2^{\ell+1} -1} 2^{\ell+1} \left(\int_{\frac{k}{2^{\ell+1}}}^\frac{k+1}{2^{\ell+1}}f\right)\chi_{\left[\frac{k}{2^{\ell+1}},\frac{k+1}{2^{\ell+1}}\right)} = \sum\limits_{k=0}^{2^{\ell+1} -1} \overline{f}_{\left[\frac{k}{2^{\ell+1}},\frac{k+1}{2^{\ell+1}}\right]} \chi_{\left[\frac{k}{2^{\ell+1}},\frac{k+1}{2^{\ell+1}}\right)}.\]

By induction, for all \ell \in \mathbb{N}

    \[\langle f, \mathbf{1} \rangle \cdot\mathbf{1} + \sum_{j=0}^{\ell} \sum_{k=0}^{2^j -1}  \langle f, \psi_{j,k} \rangle\cdot \psi_{j,k} = \sum\limits_{k=0}^{2^{\ell+1} -1} \overline{f}_{\left[\frac{k}{2^{\ell+1}},\frac{k+1}{2^{\ell+1}}\right]} \chi_{\left[\frac{k}{2^{\ell+1}},\frac{k+1}{2^{\ell+1}}\right)}.\]

By the Mean Value Theorem for integrals, there is some c_{\ell +1,k} \in \left(\frac{k}{2^{\ell+1}},\frac{k+1}{2^{\ell+1}}\right) such that f(c_{\ell +1,k}) =\overline{f}_{\left[\frac{k}{2^{\ell+1}},\frac{k+1}{2^{\ell+1}}\right]}, thus we have a dyadic step function on the right hand side that is uniformly convergent to f by the first proof of Lemma 5.23. \blacksquare

Leave a Reply

Your email address will not be published.