8 Convergence concepts

Reference: Chapter 6. Resnick (2005).

Let’s consider a sequence of real number, say $a_{n}$ , then stating that the associated series converges, formally $\sum_{k = 1}^{\infty} α_{k} < \infty$ , implies that from a certain $k$ awards $a_{k} = 0$ , i.e. $\sum_{k = 1}^{\infty} α_{k} < \infty ⟺ lim_{N \to \infty} \sum_{k = n}^{N} a_{k} = 0 .$

8.1 Types of convergence

Definition 8.1 ( $Point wise$ )
A sequence of random variables ${X_{n}}_{n \geq 1}$ is said to be convergent point wise to a limit $X$ if for all $ω \in Ω$ : $X_{n} (ω) \underset{n \to \infty}{⟶} X (ω) ⟺ lim_{n \to \infty} X_{n} (ω) = X (ω) .$ This kind of definition requires that convergence happen for every $ω \in Ω$ .

Example: Point wise convergence

Example 8.1 Let $Ω = {0, 1}$ and let’s define for each $ω$ a sequence of random variables defined as: $X_{n} (ω) = \frac{ω}{n} .$ Then for every $X_{n} (ω)$ converges pointwise to 0, in fact $lim_{n \to \infty} X_{n} (ω) = \frac{ω}{n} = 0 \forall ω \in Ω .$

Definition 8.2 ( $Almost Surely$ )
A sequence of random variables ${X_{n}}_{n \geq 1}$ is said to be convergent almost surely to a limit $X$ if: $P {ω \in Ω : lim_{n \to \infty} X_{n} (ω) = X (ω)} = 1 .$ Usually, such kind of convergence is denoted as: $X_{n} (ω) \overset{a.s.}{\underset{n \to \infty}{⟶}} X (ω) .$ In other terms, an almost surely convergence implies the relation must holds for all $ω \in Ω$ with the exception of some $ω$ ’s, that are in $Ω$ , but whose probability of occurrence is zero.

Example: convergence almost surely

Example 8.2 Let $Ω = [0, 1]$ with $ω \sim Uniform [0, 1]$ . Define the sequence of random variables $X_{n} (ω) = 1_{[ω \leq \frac{1}{n}]} (ω),$

If $ω > 0$ , then for sufficiently large $n$ we have $ω > 1 / n$ , hence $X_{n} (ω) = 0$ eventually and $X_{n} (ω) \to 0$ .
If $ω = 0$ , then $X_{n} (0) = 1$ for all $n$ , so $X_{n} (0) \to 1$ .

Thus the pointwise limit is $0$ for all $ω > 0$ and $1$ for $ω = 0$ . Since the exceptional set ${ω = 0}$ has probability zero under the uniform law, the limit is $0$ almost surely: $X_{n} \overset{a.s.}{\underset{n \to \infty}{⟶}} 0 .$

Definition 8.3 ( $In Probability$ )
A sequence of random variables ${X_{n}}_{n \geq 1}$ is said to be convergent in probability to a limit $X$ if, for a fixed $ϵ > 0$ : $lim_{n \to \infty} P {ω \in Ω : | X_{n} (ω) - X (ω) | > ϵ} = 0 .$ Usually, such kind of convergence is denoted as: $X_{n} (ω) \overset{p}{\underset{n \to \infty}{⟶}} X (ω) .$

Example: convergence in probability

Example 8.3 Let $X_{n} \sim Bernoulli (1 / n)$ , independent across $n$ , then $X_{n}$ converges in probability to zero, in fact, fixed an $ϵ > 0$ $P (| X_{n} - 0 | > ϵ) = P (X_{n} = 1) = \frac{1}{n} \overset{p}{\underset{n \to \infty}{⟶}} 0 .$

Definition 8.4 ( $In L_{p}$ )
A sequence of events $X_{n}$ such that: $E {| X_{n} |^{p}} < \infty, E {| X |^{p}} < \infty,$ is said to be convergent in $L_{p}$ , with $p > 0$ , to a random variable $X$ iff $X_{n} (ω) \overset{L_{p}}{\underset{n \to \infty}{⟶}} X (ω) ⟺ lim_{n \to \infty} E {| X_{n} - X |^{p}} = 0 .$ Usually, such kind of convergence is denoted as: $X_{n} \underset{n \to \infty}{\overset{L_{p}}{⟶}} X .$

Note that, it can be proved that there is no relation between almost sure convergence and $L_{p}$ convergence, i.e. one do not imply the other and viceversa. However, a convergence in a bigger space, say $q > s$ implies the convergence in the smaller space, i.e. $X_{n} \underset{n \to \infty}{\overset{L_{q}}{⟶}} X ⟹ X_{n} \underset{n \to \infty}{\overset{L_{p}}{⟶}} X, 0 < p < q .$

Example:

L_{p}

convergence

Example 8.4 Let $X_{n} = 1 / n$ almost surely, then $X_{n} \to 0$ in $L_{p}$ for any $p \geq 1$ since $E {| X_{n} - 0 |^{p}} = {(\frac{1}{n})}^{p} \overset{L_{p}}{\underset{n \to \infty}{⟶}} 0 .$

Definition 8.5 ( $In Distribution$ )
A sequence of random variables $X_{n}$ is said to be convergent in distribution to a random variable $X$ if the distribution of $F_{X_{n}}$ to $F_{X}$ for all $x$ , i.e. $lim_{n \to \infty} F_{X_{n}} (x) = F_{X} (x),$ where $x$ are continuity points of $F_{X}$ . Usually, such kind of convergence is denoted as: $X_{n} (ω) \overset{d}{\underset{n \to \infty}{⟶}} X (ω) .$

In other terms, we have convergence in distribution if the distribution of $X_{n}$ , namely $F_{X_{n}}$ , converges as $n \to \infty$ to the distribution of $X$ , namely $F_{X}$ . Note that the convergence in distribution is not related with probability space but involves only the distribution functions.

Example: convergence in distribution

Example 8.5 Let $X_{n}$ be a sequence of normal random variables, i.e. $X_{n} \sim N (\frac{μ}{n}, σ^{2} + \frac{1}{n}) .$ As $n \to \infty$ , the distribution of $X_{n}$ collapses to a normal with mean zero, variance $σ^{2}$ , i.e. $X_{n} \overset{d}{\underset{n \to \infty}{⟶}} N (0, σ^{2}) .$

8.2 Laws of Large Numbers

There are many versions of laws of large numbers (LLN). In general, a sequence ${X_{n}}_{n \geq 1}$ is said to satisfy a LLN if: $\frac{S_{n}}{n} = \frac{1}{n} \sum_{i = 1}^{n} X_{i} ⟶ X .$

Strong vs weak laws of large numbers

In general, if convergence happens almost surely (Definition 8.2) we speak about strong laws of large numbers (SLLN). Otherwise, if convergence happens in probability we speak about weak laws of large numbers (WLLN). A crucial difference to be noted is that when convergence happens almost surely we are dealing with a limit of a sequence of sets (limit is inside $P$ ), instead if convergence happens in probability we are dealing with a limit of a sequence of real numbers in $[0, 1]$ (limit is outside $P$ ).

8.2.1 Strong Laws of Large Numbers

Proposition 8.1 ( $Kolmogorov SLLN$ )
Let’s consider a sequence of IID random variables ${X_{n}}_{n \geq 1}$ . Then, there exist a constant $c \in R$ such that: $\frac{1}{n} \sum_{i = 1}^{n} X_{i} = \frac{S_{n}}{n} \overset{a.s.}{\underset{n \to \infty}{⟶}} c .$ Then, if $E {| X_{1} |} < \infty$ in which case $c = E {| X_{1} |}$ .

Proposition 8.2 ( $SLLN without independence$ )
Let’s consider a sequence of identically distributed random variables ${X_{n}}_{n \geq 1}$ , i.e. $E {X_{n}} = E {X_{1}}$ for all $n$ , such that:

$E {X^{2}} < c$ where $c > 0$ is a constant independent from $n$ .
$C v {X_{i}, X_{j}} = 0 \forall i \neq j$ .

$\frac{1}{n} \sum_{i = 1}^{n} X_{i} = \frac{S_{n}}{n} \overset{a.s.}{\underset{n \to \infty}{⟶}} E {X_{1}} .$

Note that the existence of the first moment and the fact that it is finite, i.e. $E {X_{1}} < \infty$ , implies that there exists the characteristic function of the random variable in zero, i.e. $\exists ϕ_{X_{1}}^{'} (0)$ . On the other hand, the existence of the characteristic function in zero do not ensure that the first moment is finite.

8.2.2 Weak Laws of Large Numbers

Let’s repeat a random experiment many times, every time ensuring the same conditions in such a way that the sequence of the experiment are IID. Then, each random variable $X_{i}$ comes from the same population with a unknown mean $E {X}$ and variance $V {X}$ . Thanks to the WLLN and repeating the experiment many times we have that the sample mean of the experiment converges in probability to the true mean in population. Convergence in probability means that: $lim_{n \to \infty} P {ω \in Ω : | \frac{1}{n} \sum_{i = 1}^{n} X_{i} (ω) - E {X (ω)} | > ϵ} = 0 .$

Proposition 8.3 ( $WLLN with variances$ )
Given a sequence of independent and identically distributed random variables ${X_{n}}_{n \geq 1}$ such that:

$E {X_{1}} = μ$ .
$E {X_{1}^{2}} < \infty$ .

${\bar{X}}_{n} = \frac{1}{n} \sum_{i = 1}^{n} X_{i} = \frac{S_{n}}{n} \overset{p}{\underset{n \to \infty}{⟶}} E {X_{1}} = μ .$

Proof: Proposition 8.3

Proof. Let’s consider the random variable ${\bar{X}}_{n} = \frac{1}{n} \sum_{i = 1}^{n} X_{i}$ , then since by assumption the mean and variance are finite, let’s apply the Chebychev inequality (Equation 5.20), i.e. $P (| {\bar{X}}_{n} - μ | \geq λ) \leq \frac{1}{λ^{2}} V {{\bar{X}}_{n} - μ} .$ Using a well known scaling property of variance let’s simplify it as: $\begin{aligned} V {{\bar{X}}_{n} - μ} & = V {\frac{1}{n} \sum_{i = 1}^{n} X_{i} - μ} = & (Constant) \\ = V {\frac{1}{n} \sum_{i = 1}^{n} X_{i}} = & (Scaling) \\ = \frac{1}{n^{2}} V {\sum_{i = 1}^{n} X_{i}} = & (Independence) \\ = \frac{1}{n^{2}} \sum_{i = 1}^{n} V {X_{i}} = & (Identically distribution) \\ = \frac{n σ^{2}}{n^{2}} = \frac{σ^{2}}{n} \end{aligned}$ Therefore the Chebychev inequality became $P (∣ {\bar{X}}_{n} - μ ∣\geq λ) \leq \frac{σ^{2}}{n λ^{2}} .$ Taking the limit as $n \to \infty$ proves the convergence in probability, i.e. $lim_{n \to \infty} P (∣ {\bar{X}}_{n} - μ ∣\geq λ) \leq lim_{n \to \infty} \frac{σ^{2}}{n λ^{2}} = 0 .$

Proposition 8.4 ( $Khintchin’s WLLN under first moment hypothesis$ )
Given a sequence of independent and identically distributed random variables ${X_{n}}_{n \geq 1}$ such that:

$E {X_{1}} < \infty$ .
$E {X_{n}} = μ$ .

${\bar{X}}_{n} = \frac{1}{n} \sum_{i = 1}^{n} X_{i} = \frac{S_{n}}{n} \overset{p}{\underset{n \to \infty}{⟶}} E {X_{1}} = μ .$

Proposition 8.5 ( $Feller’s WLLN without first moment$ )
Given a sequence of independent and identically distributed random variables ${X_{n}}_{n \geq 1}$ such that: $lim_{x \to \infty} x P {| X_{1} | > x} = 0,$ then ${\bar{X}}_{n} = \frac{1}{n} \sum_{i = 1}^{n} X_{i} = \frac{S_{n}}{n} \overset{p}{\underset{n \to \infty}{⟶}} E {X_{1} 1_{[| X_{1} | \leq n]}} .$ Note that this result makes not assumptions about a finite first moment.

SLLN (without independence) implies WLLN

Proof. Let’s verity that under the assumptions of the SLLN without independence (Proposition 8.2) we will always have convergence in probability, i.e. ${\bar{X}}_{n} = \frac{1}{n} \sum_{i = 1}^{n} X_{i} \overset{p}{\underset{n \to \infty}{⟶}} E {X_{1}} .$

Using Chebychev inequality (Equation 5.20), fix an $ε > 0$ such that: $P (| {\bar{X}}_{n} - E {X_{1}} | > ε) \leq \frac{V {{\bar{X}}_{n}}}{ε^{2}} .$ Let’s explicit the computations, i.e. $\begin{aligned} \frac{V {{\bar{X}}_{n}}}{ε^{2}} & = \frac{1}{n^{2} ε^{2}} V {\sum_{i = 1}^{n} X_{i}} = \\ = \frac{1}{n^{2} ε^{2}} [\sum_{i = 1}^{n} V {X_{i}} + \sum_{i = 1}^{n} \sum_{j \neq i}^{n} C v {X_{i}, X_{j}}] \end{aligned}$ By assumption the covariances are zero $C v {X_{i}, X_{j}} = 0 \forall i \neq j$ . Moreover, since $V {X_{i}} = E {X_{i}^{2}} - E {X_{i}}^{2}$ it is possible to upper bound the variance with the second moment, namely $V {X_{i}} \leq E {X_{i}^{2}}$ , i.e. $\frac{1}{n^{2} ε^{2}} \sum_{i = 1}^{n} V {X_{i}} \leq \frac{1}{n^{2} ε^{2}} \sum_{i = 1}^{n} E {X_{i}^{2}} .$ Since by the assumption of the SLLN we have that $E {X^{2}} < c$ where $c > 0$ is a constant independent from $n$ we can further upper bound the probability by: $\frac{1}{n^{2} ε^{2}} \sum_{i = 1}^{n} E {X_{i}^{2}} \leq \frac{1}{n^{2} ε^{2}} \sum_{i = 1}^{n} c = \frac{n c}{n^{2} ε^{2}} = \frac{c}{n ε^{2}} .$ Finally if we take the limit for $n \to \infty$ it is equal to zero implying convergence in probability: $0 \leq lim_{n \to \infty} P (| {\bar{X}}_{n} - E {X_{1}} | > ε) \leq lim_{n \to \infty} \frac{c}{n ε^{2}} = 0 .$

8.3 Central Limit Theorem

Theorem 8.1 ( $Central Limit Theorem (CLT) - IID case$ )
Let’s consider a sequence of $n$ random variables, $X_{n} = (X_{1}, \dots, X_{n})$ , where each $X_{i}$ is independent and identically distributed (IID), i.e. $\begin{aligned} X_{i} \sim IID (μ, σ^{2}) & ⟹ E {X_{i}} = E {X_{1}} = μ \\ ⟹ V {X_{i}} = V {X_{1}} = σ^{2} \end{aligned}$ Then, the CLT states that, when the sample is large, the random variable $S_{n}$ $S_{n} = \sum_{i = 1}^{n} X_{i},$ defined by the sum of all the $X_{i}$ , is normally distributed, i.e. $S_{n} \overset{d}{\underset{n \to \infty}{\sim}} N (E {S_{n}}, V {S_{n}}) .$ Since the $X_{i}$ are IID, the moments of $S_{n}$ reads explicitly $E {S_{n}} = n E {X_{1}} = n μ, V {S_{n}} = n V {X_{1}} = n σ^{2} .$ Alternatively, the CLT can be written in terms of the standardized random variable $Z_{n}$ $Z_{n} = \frac{S_{n} - E {S_{n}}}{\sqrt{V {S_{n}}}} = \frac{\sum_{i = 1}^{n} X_{i} - n μ}{\sqrt{n} σ} \overset{d}{\underset{n \to \infty}{\sim}} N (0, 1),$ that on large samples are distributed as a standard normal.