6 Conditional expectation

Theorem 6.1 ( $Radom Nikodym$ )
Consider a measure space $(Ω, B)$ and two measures $μ$ , $ν$ such that $μ$ is $σ$ -finite (Definition 31.4) and $μ << ν$ (Definition 31.1). Then there exists a measurable function $X : Ω \to R$ such that: $μ (B) = \int_{A} X d ν \forall B \in B$

Definition 6.1 ( $Conditional expectation$ )
Given a probability space $(Ω, B, P)$ , consider $G$ as a sub $σ$ -field of $B$ , i.e. $G \subset B$ . Let’s consider a random variable $X : Ω \to R$ with finite expectation $E {| X |} < + \infty$ . We define a conditional expectation for $X$ given $G$ , any random variable $Y = E {X | G}$ such that:

$Y$ has finite expectation, i.e. $E {| Y |} < + \infty$ .
$Y$ is $G$ -measurable.
$E {1_{A} Y} = E {1_{A} X}$ , $\forall A \in G$ , namely if $X$ and $Y$ are restricted to $A \in G$ , then their expectation coincides.

A $σ$ -field can be used to describe our state of information. It means that, $\forall A \in G$ we already know if the event A has occurred or not. Therefore, when we insert in $G$ the events that we know were already occurred, we are saying that the random variable $Y$ is $G$ -measurable, i.e. the value of $Y$ is not stochastic once we know the information contained in $G$ . Moreover, the random variable $Y = E {X | G}$ represent a prediction of the random variable $X$ , given the information contained in the sub $σ$ -field $G$ .

Definition 6.2 ( $Predictor$ )
Consider $Z$ any $G$ -measurable random variable. Then $Z$ can be interpreted as a predictor of another random variable $X$ under the information contained in the $σ$ -field $G$ . However, when we substitute $X$ with its prediction, namely $Z$ , we make an error given by the difference $X - Z$ . In the special case in which $E {| Z |^{2}} < \infty$ , we can take as error function the mean squared error, i.e. $E {{error}^{2}} = E {(X - Z)^{2}}$ We say that the conditional expectation $E {X | G}$ is the best predictor in the sense that: $E {(X - E {X | G})^{2}} = min_{Z \in Z} E {(X - Z)^{2}}$ Hence, $E {X | G}$ is the best predictor that minimize the mean squared error over the class $Z$ composed by $G$ -measurable functions with finite second moment, formally
$Z = {Z G -measurable and E {| Z |^{2}} < \infty}$

6.1 Properties of conditional expectation

Here we state some useful properties of conditional expectation:

Linearity: $E {a X + b Y | G} = a E {X | G} + b E {Y | G}$ , for all constants $a, b \in R$ .
Positive: $X \geq 0 ⟹ E {X | G} \geq 0$ .
Measurability: If $Y$ is $G$ -measurable, then $E {X Y | G} = Y E {X | G}$ .
Constant: $E {a | G} = a \forall a \in R$ . In general, if $X$ is $G$ -measurable then $E {X | G} = X$ , i.e. is not stochastic.
Independence: If $X$ is independent from the $σ$ -field $G$ , then $E {X | G} = E {X}$ .
Chain rule: consider two two sub $σ$ -fields of $B$ such that $G_{1} \subset G_{2}$ , then we can write: $E {X | G_{1}} = E {E {X | G_{2}} | G_{1}}$ Remember that, when using this property it is mandatory to take the conditional expectation before with respect to the greatest $σ$ -field, i.e. the one that contains more information (in this case $G_{2}$ ), and then with respect to the smallest one (in this case $G_{1}$ ).

6.2 Conditional probability

Definition 6.3 ( $Conditional probability$ )
Given a probability space $(Ω, F, P)$ , consider $G$ as a sub $σ$ -field of $F$ , i.e. $G \subset F$ . Then the general definition of the conditional probability of an event $A$ given $G$ is: $\begin{matrix} (6.1) & P (A | G) = E (1_{A} | G) \end{matrix}$ Instead, the elementary definition do not consider the conditioning with respect to a $σ$ -field, but instead with respect to a single event $B$ . In practice, take an event $B \in F$ such that $0 < P (B) < 1$ , then $\forall A \in F$ the conditional probability of $A$ given $B$ is defined as: $\begin{matrix} (6.2) & P (A | B) = \frac{P (A \cap B)}{P (B)}, P (A | B^{c}) = \frac{P (A \cap B^{c})}{P (B^{c})} \end{matrix}$

Elementary and the general definition are equivalent

The elementary (Equation 6.2) and the general (Equation 6.1) definitions are equivalent, in fact consider a sub $σ$ -field $G$ which provides only the information concerning whenever $ω$ is in $B$ or not. A $σ$ -field of this kind will have the form $G_{B} = {Ω, \emptyset, B, B^{c}}$ . Then, consider a $G_{B}$ -measurable function, $f : Ω \to R$ , such that: $f (ω) = {\begin{cases} α ω \in B \\ β ω \in B^{c} \end{cases}$ It remains to find $α$ and $β$ in the following expression: $P (A | G_{B}) = E {1_{A} | G_{B}} = α 1_{B} + β 1_{B^{c}}$ Note that, the joint probability of $A$ and $B$ can be obtained as: $\begin{aligned} P (A \cap B) & = E {1_{A} 1_{B}} = E {E {1_{A} 1_{B} | G_{B}}} = E {E {1_{A} | G_{B}} 1_{B}} = \\ = E {P (A | G_{B}) 1_{B}} = \\ = E {(α 1_{B} + β 1_{B^{c}}) 1_{B}} = \\ = α E {1_{B}} + β E {1_{B^{c}} 1_{B}} = \\ = α P (B) \end{aligned}$ Hence, we obtain: $P (A \cap B) = α P (B) ⟹ α = \frac{P (A \cap B)}{P (B)}$ Equivalently for $P (A \cap B^{c})$ it is possible to prove that: $P (A \cap B^{c}) = β P (B^{c}) ⟹ β = \frac{P (A \cap B^{c})}{P (B^{c})}$

Finally it is possible to write the conditional probability in the general definition as a linear combination of conditional probabilities defined accordingly to the elementary one, i.e. $P (A | G_{B}) = P (A | B) 1_{B} + P (A | B^{c}) 1_{B^{c}}$

Conditional probability

Example 6.1 Let’s continue from the example Example 2.1, let’s say that we observe $X (ω) = {+ 1}$ , then we ask ourselves, what is the probability that in the next extraction $X (ω) = {0}$ ? The chances that with 52 cards we obtain $X (ω) = {0}$ is approximately $\frac{3}{13} \approx 23.08 %$ (see Example 3.1). Then, given the fact that the extracted card originates $X (ω) = {+ 1}$ we have that the probability, conditional to the fact that in the first extraction we had a card ${+ 1}$ , that in the next extraction we have ${0}$ is $\frac{12}{51} = 23.52 %$ . Let’s now investigate the chances that in the next extraction $X (ω) = {+ 1}$ given that in the previous was ${+ 1}$ . The unconditional probability is $\frac{20}{52} \approx 38.46 %$ , the conditional probability will be $\frac{19}{51} \approx 37.25 %$ .

Conditional probability: numerical example

Example 6.2 Let’s consider two random variables $X (ω)$ and $Y (ω)$ taking values in ${0, 1}$ . The marginal probabilities $P (X = 0) = 0.6$ and $P (Y = 0) = 0.29$ . Let’s consider the matrix of joint events and probabilities, i.e. $(\begin{matrix} [X = 0] \cap [Y = 0] & [X = 0] \cap [Y = 1] \\ [X = 1] \cap [Y = 0] & [X = 1] \cap [Y = 1] \end{matrix}) \overset{P}{⟶} (\begin{matrix} 0.17 & 0.43 \\ 0.12 & 0.28 \end{matrix})$ Then, by definition the conditional probabilities are defined as: $P (X = 0 | Y = 0) = \frac{P (X = 0 \cap Y = 0)}{P (Y = 0)} = \frac{0.17}{0.29} \approx 58.63 %$ and $P (X = 0 | Y = 1) = \frac{P (X = 0 \cap Y = 1)}{P (Y = 1)} = \frac{0.43}{1 - 0.29} \approx 60.56 %$ Considering $Y$ instead: $P (Y = 0 | X = 0) = \frac{P (Y = 0 \cap X = 0)}{P (X = 0)} = \frac{0.17}{0.6} \approx 28.33 %$ and $P (Y = 0 | X = 1) = \frac{P (Y = 0 \cap X = 1)}{P (X = 1)} = \frac{0.12}{1 - 0.6} \approx 30 %$ Then, it is possible to express the marginal probability of $X$ as: $\begin{aligned} P (X = 0) & = E {P (X = 0 | Y)} = \\ = P (X = 0 | Y = 0) P (Y = 0) + P (X = 0 | Y = 1) P (Y = 1) = \\ = 0.5863 \cdot 0.29 + 0.6056 \cdot (1 - 0.29) \approx 60 % \end{aligned}$ And similarly for $Y$ $\begin{aligned} P (Y = 0) & = E {P (Y = 0 | X)} = \\ = P (Y = 0 | X = 0) P (X = 0) + P (Y = 0 | X = 1) P (X = 1) = \\ = 0.2833 \cdot 0.6 + 0.30 \cdot (1 - 0.6) \approx 29 % \end{aligned}$