Reference: Chapter 6. Resnick (2005).
Let’s consider a sequence of real number, say , then stating that the associated series converges, formally , implies that from a certain  awards , i.e.  
 Types of convergence
Definition 8.1 ()
A sequence of random variables  is said to be convergent point wise to a limit  if for all :  This kind of definition requires that convergence happen for every .
 
Example 8.1 Let  and let’s define for each  a sequence of random variables defined as:  Then for every  converges pointwise to 0, in fact 
 
 
 
 
Definition 8.2 ()
A sequence of random variables  is said to be convergent almost surely to a limit  if:  Usually, such kind of convergence is denoted as:  In other terms, an almost surely convergence implies the relation must holds for all  with the exception of some ’s, that are in , but whose probability of occurrence is zero.
 
Example 8.2 Let  with . Define the sequence of random variables 
- If , then for sufficiently large  we have , hence  eventually and .
 
- If , then  for all , so .
 
Thus the pointwise limit is  for all  and  for . Since the exceptional set  has probability zero under the uniform law, the limit is  almost surely: 
 
 
 
 
Definition 8.3 ()
A sequence of random variables  is said to be convergent in probability to a limit  if, for a fixed :  Usually, such kind of convergence is denoted as: 
 
Example 8.3 Let , independent across , then  converges in probability to zero, in fact, fixed an  
 
 
 
 
Definition 8.4 ()
A sequence of events  such that:  is said to be convergent in , with , to a random variable  iff  Usually, such kind of convergence is denoted as: 
 
Note that, it can be proved that there is no relation between almost sure convergence and  convergence, i.e. one do not imply the other and viceversa. However, a convergence in a bigger space, say  implies the convergence in the smaller space, i.e.  
Example 8.4 Let  almost surely, then  in  for any  since 
 
 
 
 
Definition 8.5 ()
A sequence of random variables  is said to be convergent in distribution to a random variable  if the distribution of  to  for all , i.e.   where  are continuity points of . Usually, such kind of convergence is denoted as: 
 
In other terms, we have convergence in distribution if the distribution of , namely , converges as  to the distribution of , namely . Note that the convergence in distribution is not related with probability space but involves only the distribution functions.
Example 8.5 Let  be a sequence of normal random variables, i.e.  As , the distribution of  collapses to a normal with mean zero, variance , i.e. 
 
 
 
 
 Laws of Large Numbers
There are many versions of laws of large numbers (LLN). In general, a sequence  is said to satisfy a LLN if: 
In general, if convergence happens almost surely (Definition 8.2) we speak about strong laws of large numbers (SLLN). Otherwise, if convergence happens in probability we speak about weak laws of large numbers (WLLN). A crucial difference to be noted is that when convergence happens almost surely we are dealing with a limit of a sequence of sets (limit is inside ), instead if convergence happens in probability we are dealing with a limit of a sequence of real numbers in  (limit is outside ).
 
 
 
 Strong Laws of Large Numbers
Proposition 8.1 ()
Let’s consider a sequence of IID random variables . Then, there exist a constant  such that:  Then, if  in which case .
 
Proposition 8.2 ()
Let’s consider a sequence of identically distributed random variables , i.e.  for all , such that:
-  where  is a constant independent from .
 
- .
 
 
Note that the existence of the first moment and the fact that it is finite, i.e. , implies that there exists the characteristic function of the random variable in zero, i.e. . On the other hand, the existence of the characteristic function in zero do not ensure that the first moment is finite.
 Weak Laws of Large Numbers
Let’s repeat a random experiment many times, every time ensuring the same conditions in such a way that the sequence of the experiment are IID. Then, each random variable  comes from the same population with a unknown mean  and variance . Thanks to the WLLN and repeating the experiment many times we have that the sample mean of the experiment converges in probability to the true mean in population. Convergence in probability means that: 
Proposition 8.3 ()
Given a sequence of independent and identically distributed random variables  such that:
- .
 
- .
 
 
Proof. Let’s consider the random variable , then since by assumption the mean and variance are finite, let’s apply the Chebychev inequality (Equation 5.20), i.e.   Using a well known scaling property of variance let’s simplify it as:  Therefore the Chebychev inequality became  Taking the limit as  proves the convergence in probability, i.e.  
 
 
 
 
Proposition 8.4 ()
Given a sequence of independent and identically distributed random variables  such that:
- .
 
- .
 
 
Proposition 8.5 ()
Given a sequence of independent and identically distributed random variables  such that:  then  Note that this result makes not assumptions about a finite first moment.
 
Proof. Let’s verity that under the assumptions of the SLLN without independence (Proposition 8.2) we will always have convergence in probability, i.e. 
Using Chebychev inequality (Equation 5.20), fix an  such that:  Let’s explicit the computations, i.e.   By assumption the covariances are zero . Moreover, since  it is possible to upper bound the variance with the second moment, namely , i.e.   Since by the assumption of the SLLN we have that  where  is a constant independent from  we can further upper bound the probability by:  Finally if we take the limit for  it is equal to zero implying convergence in probability: 
 
 
 
 
 Central Limit Theorem
Theorem 8.1 ()
Let’s consider a sequence of  random variables, , where each  is independent and identically distributed (IID), i.e.  Then, the CLT states that, when the sample is large, the random variable   defined by the sum of all the , is normally distributed, i.e.   Since the  are IID, the moments of  reads explicitly  Alternatively, the CLT can be written in terms of the standardized random variable   that on large samples are distributed as a standard normal.
 
Resnick, Sidney I. 2005. 
A Probability Path. Birkhauser. 
https://link.springer.com/book/10.1007/978-0-8176-8409-9.