5 Expectation

The expectation represents a central value of a random variable and has a measure theory counterpart as a Lebesgue-Stieltjes integral of $X$ with respect to a (probability) measure $P$ . This kind of integration is defined in steps. First it is shown the integration of simple functions and then extended to more general random variables. In general, let’s consider a probability space $(Ω, B, P)$ and a random variable $X$ such that $X : (Ω, B) ⟶ (\bar{R}, B (\bar{R}))$ where $\bar{R} = [- \infty, \infty]$ . Then, the expectation of $X$ is denoted as: $E {X} = \int_{Ω} X d P = \int_{Ω} X (ω) P (d ω)$ as the Lebesgue-Stieltjes integral of $X$ with respect to the (probability) measure $P$ .

5.1 Simple functions

In general a random variable $X (ω)$ is simple if it has a finite range. Let’s consider a probability space $(Ω, B, P)$ and consider a $B / B (R)$ -measurable simple function $X : Ω \to R$ , i.e. $\begin{matrix} (5.1) & X (ω) = \sum_{i = 1}^{n} a_{i} 1_{A_{i}} (ω), \end{matrix}$ where $a_{i} \in R$ and $A_{i} \in B$ are a disjoint partition of the sample space, i.e. $⨆_{i = 1}^{n} A_{i} = Ω$ . Let’s denote the set of all simple functions on $Ω$ as $E$ . In this settings, $E$ is a vector space. This implies that he following two properties holds.

Constant: given a simple function $X \in E$ , then $α X \in E$ . In fact: $\begin{matrix} (5.2) & \begin{aligned} α X & = \sum_{i = 1}^{n} α a_{i} 1_{A_{i}} = \sum_{i = 1}^{n} a_{i}^{*} 1_{A_{i}} \in E \end{aligned} \end{matrix}$ where $a_{i}^{*} = α a_{i}$ .
Linearity: given two simple function $X, Y \in E$ , then $X + Y \in E$ . In fact: $\begin{matrix} (5.3) & \begin{aligned} X + Y & = \sum_{i = 1}^{n} a_{i} 1_{A_{i}} + \sum_{j = 1}^{m} b_{j} 1_{B_{j}} = \\ = \sum_{i = 1}^{n} \sum_{j = 1}^{m} (a_{i} + b_{j}) 1_{A_{i}} 1_{B_{j}} = \\ = \sum_{i = 1}^{n} \sum_{j = 1}^{m} (a_{i} + b_{j}) 1_{A_{i} \cap B_{j}} \end{aligned} \end{matrix}$ where the sequence of sets ${A_{i} B_{j} 1 \leq i \leq n and 1 \leq j \leq m}$ form a disjoint partition of $Ω$ .
Product: given two simple function $X, Y \in E$ , then $X Y \in E$ . In fact: $\begin{matrix} (5.4) & \begin{aligned} X Y & = \sum_{i = 1}^{n} a_{i} 1_{A_{i}} \sum_{j = 1}^{m} b_{j} 1_{B_{j}} = \\ = \sum_{i = 1}^{n} \sum_{j = 1}^{m} (a_{i} b_{j}) 1_{A_{i}} 1_{B_{j}} = \\ = \sum_{i = 1}^{n} \sum_{j = 1}^{m} (a_{i} b_{j}) 1_{A_{i} \cap B_{j}} \end{aligned} \end{matrix}$

5.1.1 Measurability

Simple functions are the building blocks in the definition of the expectation in terms of Lebesgue-Stieltjes integral. In fact a known theorem called Measurability theorem shows that any measurable function can be approximated by a sequence of simple functions.

Theorem 5.1 Suppose that $X (ω) \geq 0$ for all $ω \in Ω$ . Then, $X$ is $B / B (R)$ measurable if and only if there exists simple functions $X_{n} \in E$ and $0 \leq X_{n} ↑ X ⟺ X = lim_{n \to \infty} ↑ X_{n}$

5.2 Expectation of Simple Functions

The expectation of a simple function $X$ is defined as: $E {X} = \sum_{i = 1}^{n} a_{i} P (A_{i})$ where $| a_{i} | < \infty$ .

5.2.1 Properties

Non-negativity: If $X \geq 0$ and $X \in E$ then $E {X} \geq 0$

Expectation of a simple function is non-negative

Proof. By definition of simple functions

Linearity: the expectation of simple function is linear, i.e. $E {α X + β Y} = α E {X} + β E {Y}$

Expectation of a simple function is linear

Proof. Let’s consider two simple functions, i.e. $X (ω) = \sum_{i = 1}^{n} a_{i} 1_{A_{i}} (ω) and Y (ω) = \sum_{j = 1}^{m} b_{j} 1_{B_{j}} (ω),$ and let’s fix $α, β \in R$ . Then, by the second property of the vector space $E$ (Equation 5.3) it is possible to write: $α X + β Y = \sum_{i = 1}^{n} \sum_{j = 1}^{m} (α a_{i} + β b_{j}) 1_{A_{i} \cap B_{j}}$ Then, taking the expectation on both sides:
$\begin{aligned} E {α X + β Y} & = \sum_{i = 1}^{n} \sum_{j = 1}^{m} (α a_{i} + β b_{j}) P (A_{i} \cap B_{j}) = \\ = \sum_{i = 1}^{n} α a_{i} \sum_{j = 1}^{m} P (A_{i} \cap B_{j}) + \sum_{j = 1}^{m} β b_{j} \sum_{i = 1}^{n} P (A_{i} \cap B_{j}) \end{aligned}$ Fixing $i$ , the sequence $A_{i} \cap B_{j}$ for $j = 1, \dots, n$ is composed by disjoint events since by definition $B_{j}$ are disjoint. Hence, applying $σ$ -additivity it is possible to write: $\begin{aligned} \sum_{j = 1}^{m} P (A_{i} \cap B_{j}) & = P (⨆_{j = 1}^{m} A_{i} \cap B_{j}) = \\ = P (A_{i} \cap (⨆_{j = 1}^{m} B_{j})) = \\ = P (A_{i} \cap Ω) = P (A_{i}) \end{aligned}$ Therefore, the expectation simplifies in: $\begin{aligned} E {α X + β Y} & = \sum_{i = 1}^{n} α a_{i} P (A_{i}) + \sum_{j = 1}^{m} β b_{j} P (B_{j}) = \\ = α E {X} + β E {Y} \end{aligned}$

5.3 Review of inequalities

5.3.1 Modulus inequality

Definition 5.1 ( $Modulus Inequality$ )
Let’s consider a random variable $X \in L_{1}$ , where $L_{1}$ stands for the set of integrable random variables, i.e. $L_{1} = {X : Ω \to R : X is a r.v., E {| X |} < \infty}$ Then, the modulus inequality states that: $| E {X} | \leq E {| X |}$

5.3.2 Markov inequality

Definition 5.2 ( $Markov Inequality$ )
Let’s consider a random variable $X \in L_{1}$ and fix a $λ > 0$ , then by the Markov inequality: $P (| X | \geq λ) \leq \frac{1}{λ} E {| X |}$

5.3.3 Chebychev inequality

Definition 5.3 ( $Chebychev Inequality$ )
Consider a random variable $X$ with first and second moment finite, i.e. $E {| X |} < \infty, V {| X |} < \infty$ then by the Chebychev inequality: $\begin{matrix} (5.5) & P (X \geq λ) \leq \frac{1}{λ^{2}} E {| X |^{2}} \end{matrix}$

5.3.4 Holder inequality

Definition 5.4 ( $Holder Inequality$ )
Let’s consider two numbers $p$ and $q$ such that $p > 1, q > 1, \frac{1}{p} + \frac{1}{q} = 1$ and let’s consider two random variables $X$ and $Y$ such that: $E {| X |^{p}} < \infty, E {| Y |^{q}} < \infty$ Then, $\begin{matrix} (5.6) & | E {X Y} | \leq E {| X Y |} \leq (E {| X |^{p}})^{\frac{1}{p}} (E {| Y |^{q}})^{\frac{1}{q}} \end{matrix}$ In terms of norms: $| | X Y | |_{1} \leq | | X | |_{p} | | Y | |_{q}$

5.3.5 Schwartz inequality

Definition 5.5 ( $Schwartz Inequality$ )
Consider two random variables $X, Y \in L_{2}$ , i.e. with first and second moment finite, i.e. $E {| X |} < \infty, E {X^{2}} < \infty$ Then $\begin{matrix} (5.7) & | E {X Y} | \leq E {| X Y |} \leq \sqrt{E {X^{2}} E {Y^{2}}} \end{matrix}$ In terms of norms: $| | X Y | |_{1} \leq | | X | |_{2} | | Y | |_{2}$ Note that this is a special case of Holder inequality (Equation 5.6) with $p = q = 2$ .

5.3.6 Minkowski inequality

Definition 5.6 ( $Minkowski Inequality$ )
For $1 \leq p < \infty$ let’s consider two random variables $X, Y \in L_{p}$ , then $X + Y \in L_{p}$ and $\begin{matrix} (5.8) & | | X + Y | |_{p} \leq | | X | |_{p} + | | Y | |_{p} \end{matrix}$

Note that the triangular inequality is a special case of Minkowski inequality with $p = 1$ , i.e. $\begin{matrix} (5.9) & | X + Y | \leq | X | + | Y | \end{matrix}$

5.3.7 Jensen inequality

Definition 5.7 ( $Jensen Inequality$ )
Let’s consider a convex function $u : R \to R$ . Suppose that $E {X} < \infty$ and $E {| u (X) |} < \infty$ , then $\begin{matrix} (5.10) & E {u (X)} \geq u (E {X}) \end{matrix}$ if $u$ is concave the results revert, i.e. $\begin{matrix} (5.11) & E {u (X)} \leq u (E {X}) \end{matrix}$