8 Convergence concepts
Reference: Chapter 6. Resnick (2005).
Let’s consider a sequence of real number, say \(a_n\), then stating that the associated series converges, formally \(\sum_{k=1}^\infty \alpha_k < \infty\), implies that from a certain \(k\) awards \(a_k = 0\), i.e. \[ \sum_{k=1}^\infty \alpha_k < \infty \iff \lim_{N \to \infty} \sum_{k=n}^{N} a_k = 0 \text{.} \]
8.1 Types of convergence
Definition 8.1 (\(\color{magenta}{\textbf{Point wise}}\))
A sequence of random variables \(\{X_n\}_{n\ge1}\) is said to be convergent point wise to a limit \(X\) if for all \(\omega \in \Omega\): \[
X_n(\omega) \underset{n \to \infty}{\longrightarrow} X(\omega) \iff \lim_{n \to \infty} X_n(\omega) = X(\omega)
\text{.}
\] This kind of definition requires that convergence happen for every \(\omega \in \Omega\).
Example 8.1 Let \(\Omega = \{0,1\}\) and let’s define for each \(\omega\) a sequence of random variables defined as: \[ X_n(\omega) = \frac{\omega}{n} \text{.} \] Then for every \(X_n(\omega)\) converges pointwise to 0, in fact \[ \lim_{n\to \infty} X_n(\omega) = \frac{\omega}{n} = 0 \quad \forall \omega \in \Omega \text{.} \]
Definition 8.2 (\(\color{magenta}{\textbf{Almost Surely}}\))
A sequence of random variables \(\{X_n\}_{n \ge 1}\) is said to be convergent almost surely to a limit \(X\) if: \[
\mathbb{P}\{\omega \in \Omega : \lim_{n \to \infty} X_n(\omega) = X(\omega)\} = 1
\text{.}
\] Usually, such kind of convergence is denoted as: \[
X_n(\omega) \overset{\text{a.s.}}{\underset{n \to \infty}{\longrightarrow}} X(\omega)
\text{.}
\] In other terms, an almost surely convergence implies the relation must holds for all \(\omega \in \Omega\) with the exception of some \(\omega\)’s, that are in \(\Omega\), but whose probability of occurrence is zero.
Example 8.2 Let \(\Omega = [0,1]\) with \(\omega \sim \text{Uniform}[0,1]\). Define the sequence of random variables \[ X_n(\omega) = \mathbf{1}_{[\omega \leq \frac{1}{n}]}(\omega) \text{,} \]
- If \(\omega > 0\), then for sufficiently large \(n\) we have \(\omega > 1/n\), hence \(X_n(\omega)=0\) eventually and \(X_n(\omega)\to 0\).
- If \(\omega = 0\), then \(X_n(0)=1\) for all \(n\), so \(X_n(0) \to 1\).
Thus the pointwise limit is \(0\) for all \(\omega > 0\) and \(1\) for \(\omega=0\). Since the exceptional set \(\{\omega=0\}\) has probability zero under the uniform law, the limit is \(0\) almost surely: \[ X_n \overset{\text{a.s.}}{\underset{n \to \infty}{\longrightarrow}} 0 \text{.} \]
Definition 8.3 (\(\color{magenta}{\textbf{In Probability}}\))
A sequence of random variables \(\{X_n\}_{n\ge1}\) is said to be convergent in probability to a limit \(X\) if, for a fixed \(\epsilon > 0\): \[
\lim_{n \to \infty}\mathbb{P}\{\omega \in \Omega : |X_n(\omega) - X(\omega)|> \epsilon\} = 0
\text{.}
\] Usually, such kind of convergence is denoted as: \[
X_n(\omega) \overset{\text{p}}{\underset{n \to \infty}{\longrightarrow}} X(\omega)
\text{.}
\]
Example 8.3 Let \(X_n \sim \text{Bernoulli}(1/n)\), independent across \(n\), then \(X_n\) converges in probability to zero, in fact, fixed an \(\epsilon > 0\) \[ \mathbb{P}(|X_n-0| > \epsilon) = \mathbb{P}(X_n=1) = \frac{1}{n} \overset{\text{p}}{\underset{n \to \infty}{\longrightarrow}} 0 \text{.} \]
Definition 8.4 (\(\color{magenta}{\textbf{In } L_p}\))
A sequence of events \(X_n\) such that: \[
\mathbb{E}\{|X_n|^p\} < \infty, \quad \mathbb{E}\{|X|^p\} < \infty
\text{,}
\] is said to be convergent in \(L_p\), with \(p > 0\), to a random variable \(X\) iff \[
X_n(\omega) \overset{L_p}{\underset{n \to \infty}{\longrightarrow}} X(\omega) \iff \lim_{n \to \infty}\mathbb{E}\{|X_n - X|^p\} = 0
\text{.}
\] Usually, such kind of convergence is denoted as: \[
X_n \underset{n\to\infty}{\overset{L_p}{\longrightarrow}} X
\text{.}
\]
Note that, it can be proved that there is no relation between almost sure convergence and \(L_p\) convergence, i.e. one do not imply the other and viceversa. However, a convergence in a bigger space, say \(q > s\) implies the convergence in the smaller space, i.e. \[ X_n \underset{n\to\infty}{\overset{L_q}{\longrightarrow}} X \implies X_n \underset{n\to\infty}{\overset{L_p}{\longrightarrow}} X, \quad 0 < p < q \text{.} \]
Example 8.4 Let \(X_n = 1/n\) almost surely, then \(X_n \to 0\) in \(L_p\) for any \(p \ge 1\) since \[ \mathbb{E}\{|X_n - 0|^p\} = \left(\frac{1}{n}\right)^p \overset{L_p}{\underset{n \to \infty}{\longrightarrow}} 0 \text{.} \]
Definition 8.5 (\(\color{magenta}{\textbf{In Distribution}}\))
A sequence of random variables \(X_n\) is said to be convergent in distribution to a random variable \(X\) if the distribution of \(F_{X_n}\) to \(F_{X}\) for all \(x\), i.e. \[
\lim_{n \to \infty} F_{X_n}(x) = F_{X}(x)
\text{,}
\] where \(x\) are continuity points of \(F_X\). Usually, such kind of convergence is denoted as: \[
X_n(\omega) \overset{\text{d}}{\underset{n \to \infty}{\longrightarrow}} X(\omega)
\text{.}
\]
In other terms, we have convergence in distribution if the distribution of \(X_n\), namely \(F_{X_n}\), converges as \(n \to \infty\) to the distribution of \(X\), namely \(F_{X}\). Note that the convergence in distribution is not related with probability space but involves only the distribution functions.
Example 8.5 Let \(X_n\) be a sequence of normal random variables, i.e. \[ X_n \sim \mathcal{N}\left(\frac{\mu}{n}, \sigma^2 + \frac{1}{n}\right) \text{.} \] As \(n \to \infty\), the distribution of \(X_n\) collapses to a normal with mean zero, variance \(\sigma^2\), i.e. \[ X_n \overset{\text{d}}{\underset{n \to \infty}{\longrightarrow}} \mathcal{N}\left(0, \sigma^2\right) \text{.} \]
8.2 Laws of Large Numbers
There are many versions of laws of large numbers (LLN). In general, a sequence \(\{X_n\}_{n\ge 1}\) is said to satisfy a LLN if: \[ \frac{S_n}{n} = \frac{1}{n}\sum_{i = 1}^{n} X_i \longrightarrow X \text{.} \]
In general, if convergence happens almost surely (Definition 8.2) we speak about strong laws of large numbers (SLLN). Otherwise, if convergence happens in probability we speak about weak laws of large numbers (WLLN). A crucial difference to be noted is that when convergence happens almost surely we are dealing with a limit of a sequence of sets (limit is inside \(\mathbb{P}\)), instead if convergence happens in probability we are dealing with a limit of a sequence of real numbers in \([0,1]\) (limit is outside \(\mathbb{P}\)).
8.2.1 Strong Laws of Large Numbers
Proposition 8.1 (\(\color{magenta}{\textbf{Kolmogorov SLLN}}\))
Let’s consider a sequence of IID random variables \(\{X_n\}_{n \ge 1}\). Then, there exist a constant \(c \in \mathbb{R}\) such that: \[
\frac{1}{n} \sum_{i = 1}^{n} X_i = \frac{S_n}{n} \overset{\text{a.s.}}{\underset{n\to \infty}{\longrightarrow}} c
\text{.}
\] Then, if \(\mathbb{E}\{|X_1|\}< \infty\) in which case \(c = \mathbb{E}\{|X_1|\}\).
Proposition 8.2 (\(\color{magenta}{\textbf{SLLN without independence}}\))
Let’s consider a sequence of identically distributed random variables \(\{X_n\}_{n \ge 1}\), i.e. \(\mathbb{E}\{X_n\} = \mathbb{E}\{X_1\}\) for all \(n\), such that:
- \(\mathbb{E}\{X^2\} < c\) where \(c > 0\) is a constant independent from \(n\).
- \(\mathbb{C}v\{X_i, X_j\} = 0 \quad \forall i \neq j\).
\[ \frac{1}{n} \sum_{i = 1}^{n} X_i = \frac{S_n}{n} \overset{\text{a.s.}}{\underset{n\to \infty}{\longrightarrow}} \mathbb{E}\{X_1\} \text{.} \]
Note that the existence of the first moment and the fact that it is finite, i.e. \(\mathbb{E}\{X_1\} < \infty\), implies that there exists the characteristic function of the random variable in zero, i.e. \(\exists \phi_{X_1}^{\prime}(0)\). On the other hand, the existence of the characteristic function in zero do not ensure that the first moment is finite.
8.2.2 Weak Laws of Large Numbers
Let’s repeat a random experiment many times, every time ensuring the same conditions in such a way that the sequence of the experiment are IID. Then, each random variable \(X_i\) comes from the same population with a unknown mean \(\mathbb{E}\{X\}\) and variance \(\mathbb{V}\{X\}\). Thanks to the WLLN and repeating the experiment many times we have that the sample mean of the experiment converges in probability to the true mean in population. Convergence in probability means that: \[ \lim_{n \to \infty}\mathbb{P}\left\{\omega \in \Omega : \left|\frac{1}{n}\sum_{i = 1}^{n}X_i(\omega) - \mathbb{E}\{X(\omega)\} \right|> \epsilon\right\} = 0 \text{.} \]
Proposition 8.3 (\(\color{magenta}{\textbf{WLLN with variances}}\))
Given a sequence of independent and identically distributed random variables \(\{X_n\}_{n \ge 1}\) such that:
- \(\mathbb{E}\{X_1\} = \mu\).
- \(\mathbb{E}\{X_1^2\} < \infty\).
\[ \bar{X}_n = \frac{1}{n} \sum_{i = 1}^{n} X_i = \frac{S_n}{n} \overset{\text{p}}{\underset{n\to \infty}{\longrightarrow}} \mathbb{E}\{X_1\} = \mu \text{.} \]
Proposition 8.4 (\(\color{magenta}{\textbf{Khintchin's WLLN under first moment hypothesis}}\))
Given a sequence of independent and identically distributed random variables \(\{X_n\}_{n \ge 1}\) such that:
- \(\mathbb{E}\{X_1\} < \infty\).
- \(\mathbb{E}\{X_n\} = \mu\).
\[ \bar{X}_n = \frac{1}{n} \sum_{i = 1}^{n} X_i = \frac{S_n}{n} \overset{\text{p}}{\underset{n\to \infty}{\longrightarrow}} \mathbb{E}\{X_1\} = \mu \text{.} \]
Proposition 8.5 (\(\color{magenta}{\textbf{Feller's WLLN without first moment}}\))
Given a sequence of independent and identically distributed random variables \(\{X_n\}_{n \ge 1}\) such that: \[
\lim_{x\to\infty}x\mathbb{P}\{|X_1| > x\} = 0
\text{,}
\] then \[
\bar{X}_n = \frac{1}{n} \sum_{i = 1}^{n} X_i = \frac{S_n}{n} \overset{\text{p}}{\underset{n\to \infty}{\longrightarrow}} \mathbb{E}\{X_1 \mathbb{1}_{[|X_1| \le n]}\}
\text{.}
\] Note that this result makes not assumptions about a finite first moment.
8.3 Central Limit Theorem
Theorem 8.1 (\(\color{magenta}{\textbf{Central Limit Theorem (CLT) - IID case}}\))
Let’s consider a sequence of \(n\) random variables, \(X_n = (X_1, \dots, X_n)\), where each \(X_i\) is independent and identically distributed (IID), i.e. \[
\begin{aligned}
X_i \sim \text{IID}(\mu, \sigma^2) {} & \implies \mathbb{E}\{X_i\} = \mathbb{E}\{X_1\} = \mu \\
& \implies \mathbb{V}\{X_i\} = \mathbb{V}\{X_1\} = \sigma^2
\end{aligned}
\] Then, the CLT states that, when the sample is large, the random variable \(S_n\) \[
S_n = \sum_{i=1}^{n} X_i
\text{,}
\] defined by the sum of all the \(X_i\), is normally distributed, i.e. \[
S_n \overset{\text{d}}{\underset{n\to\infty}{\sim}}
\mathcal{N}(\mathbb{E}\{S_n\}, \mathbb{V}\{S_n\})
\text{.}
\] Since the \(X_i\) are IID, the moments of \(S_n\) reads explicitly \[
\mathbb{E}\{S_n\} = n \mathbb{E}\{X_1\} = n \mu, \quad \mathbb{V}\{S_n\} = n \mathbb{V}\{X_1\} = n \sigma^2
\text{.}
\] Alternatively, the CLT can be written in terms of the standardized random variable \(Z_n\) \[
Z_n = \frac{S_n - \mathbb{E}\{S_n\}}{\sqrt{\mathbb{V}\{S_n\}}} = \frac{\sum_{i=1}^{n} X_i - n \mu }{\sqrt{n} \, \sigma} \overset{\text{d}}{\underset{n\to\infty}{\sim}}
\mathcal{N}(0, 1)
\text{,}
\] that on large samples are distributed as a standard normal.