6 Conditional expectation
Reference: Pietro Rigo (2023)
Theorem 6.1 (\(\color{magenta}{\textbf{Radom Nikodym}}\))
Consider a measure space \((\Omega, \mathcal{B})\) and two measures \(\mu\), \(\nu\) such that \(\mu\) is \(\sigma\)-finite (Definition 30.4) and \(\mu << \nu\) (Definition 30.1). Then there exists a measurable function \(X: \Omega \to \mathbb{R}\) such that: \[
\mu(B) = \int_A X d\nu \quad \forall B \in \mathcal{B}
\]
Definition 6.1 (\(\color{magenta}{\textbf{Conditional expectation}}\))
Given a probability space \((\Omega, \mathcal{B}, \mathbb{P})\). Let’s consider a sub \(\sigma\)-field of \(\mathcal{B}\), i.e. \(\mathcal{G} \subset \mathcal{B}\) and a random variable \(X: \Omega \rightarrow \mathbb{R}\) with finite expectation \(\mathbb{E}\{|X|\} < + \infty\). Then, the conditional expectation of \(X\) given \(\mathcal{G}\) is any random variable \[
Z = \mathbb{E}\{X \mid \mathcal{G}\}
\text{,}
\] such that:
- \(Z\) has finite expectation, i.e. \(\mathbb{E}\{|Y|\} < + \infty\).
- \(Z\) is \(\mathcal{G}\)-measurable.
- \(\mathbb{E}\{\mathbb{1}_{A} Z \} = \mathbb{E}\{\mathbb{1}_{A}X\}\), \(\forall A \in \mathcal{G}\), namely if \(X\) and \(Z\) are restricted to \(A \in \mathcal{G}\), then their expectation coincides.
A \(\sigma\)-field can be used to describe our state of information. It means that, \(\forall A \in \mathcal{G}\) we already know if the event \(A\) has occurred or not. Therefore, when we insert in \(\mathcal{G}\) the events that we know were already occurred, we are saying that the random variable \(Z\) is \(\mathcal{G}\)-measurable, i.e. the value of \(Z\) is not stochastic once we know the information contained in \(\mathcal{G}\). In this context, one can see the random variable \(Y = \mathbb{E}\{X \mid \mathcal{G}\}\) as the prediction of \(X\), given the information contained in the sub \(\sigma\)-field \(\mathcal{G}\).
Definition 6.2 (\(\color{magenta}{\textbf{Predictor}}\))
Consider \(Z\) any \(\mathcal{G}\)-measurable random variable. Then \(Z\) can be interpreted as a predictor of another random variable \(X\) under the information contained in the \(\sigma\)-field \(\mathcal{G}\). When we substitute \(X\) with its prediction \(Z\), we make an error given by the difference \(X - Z\). In the special case in which \(\mathbb{E}\{|Z|^2\} < \infty\) and using as error function the mean squared error, i.e. \[
\mathbb{E}\{\text{error}^2\} = \mathbb{E}\{(X - Z)^2\}
\text{,}
\] then it is possible to prove that the conditional expectation \(\mathbb{E}\{X \mid \mathcal{G}\}\) represent the best predictor of \(X\) in the sense that it minimized the mean squared error, i.e. \[
\mathbb{E}\{(X - \mathbb{E}\{X \mid \mathcal{G}\})^2\} = \underset{\small Z \in \mathcal{Z}}{\min}\left[\mathbb{E}\{(X - Z)^2\}\right]
\text{.}
\] Hence, \(\mathbb{E}\{X|\mathcal{G}\}\) is the best predictor that minimize the mean squared error over the class \(\mathcal{Z}\) composed by \(\mathcal{G}\)-measurable functions with finite second moment, i.e. \[
Z = \mathbb{E}\{X \mid \mathcal{G}\} = \underset{\small Z \in \mathcal{Z}}{\text{argmin}}\left[\mathbb{E}\{(X - Z)^2\}\right]
\text{,}
\] where \(\mathcal{Z} = \{Z : Z \text{ is } \mathcal{G}\text{-measurable} \; \text{and} \; \mathbb{E}\{|Z|^2\} < \infty \}\).
6.1 Properties of conditional expectation
Here we state some useful properties of conditional expectation:
- Linearity: The conditional expectation is linear for all all constants \(a, b \in \mathbb{R}\), i.e. \[ \mathbb{E}\{ a X + b Y \mid \mathcal{G} \} = a \mathbb{E}\{X \mid \mathcal{G} \} + b \mathbb{E}\{Y \mid \mathcal{G}\} \text{.} \]
- Positive: \(X \ge 0\) implies that \(\mathbb{E}\{X \mid \mathcal{G}\} \ge 0\).
- Measurability: If \(Y\) is \(\mathcal{G}\)-measurable, then \(\mathbb{E}\{XY \mid \mathcal{G}\} = Y \mathbb{E}\{X \mid \mathcal{G}\}\). In general, if \(X\) is \(\mathcal{G}\)-measurable then \(\mathbb{E}\{X \mid \mathcal{G}\} = X\), i.e. \(X\) is not stochastic.
- Constant: The conditional expectation of a constant is a constant, i.e. \[ \mathbb{E}\{a \mid \mathcal{G}\} = a \; \; \forall a \in \mathbb{R} \text{.} \]
- Independence: If \(X\) is independent from the \(\sigma\)-field \(\mathcal{G}\), then \(\mathbb{E}\{X \mid \mathcal{G}\} = \mathbb{E}\{X\}\).
- Chain rule: If one consider two sub \(\sigma\)-fields of \(\mathcal{B}\) such that \(\mathcal{G_1} \subset \mathcal{G_2}\), then we can write: \[ \mathbb{E}\{X \mid \mathcal{G_1}\} = \mathbb{E}\{\mathbb{E}\{X \mid \mathcal{G_2}\} \mid \mathcal{G_1}\} \iff \mathcal{G_1} \subset \mathcal{G_2} \text{.} \tag{6.1}\] Remember that, when using the chain rule it is mandatory to take the conditional expectation before with respect to the greatest \(\sigma\)-field, i.e. the one that contains more information (in this case \(\mathcal{G_2}\)), and then with respect to the smallest one (in this case \(\mathcal{G_1}\)).
6.2 Conditional probability
Proposition 6.1 (\(\color{magenta}{\textbf{Conditional probability}}\))
Given a probability space \((\Omega, \mathcal{F}, \mathbb{P})\), consider \(\mathcal{G}\) as a sub \(\sigma\)-field of \(\mathcal{F}\), i.e. \(\mathcal{G} \subset \mathcal{F}\). Then the general definition of the conditional probability of an event \(A\), given \(\mathcal{G}\), is \[
\mathbb{P}(A \mid \mathcal{G}) = \mathbb{E}(\mathbb{1}_A \mid \mathcal{G})
\text{.}
\tag{6.2}\] Instead, the elementary definition do not consider the conditioning with respect to a \(\sigma\)-field, but instead with respect to a single event \(B\). In practice, take an event \(B \in \mathcal{F}\) such that \(0 < \mathbb{P}(B) < 1\), then \(\forall A \in \mathcal{F}\) the conditional probability of \(A\) given \(B\) is defined as: \[
\mathbb{P}(A \mid B) = \frac{\mathbb{P}(A {\color{blue}{\cap}} B)}{\mathbb{P}(B)}
\text{,} \quad
\mathbb{P}(A \mid B^c) = \frac{\mathbb{P}(A {\color{blue}{\cap}} B^c)}{\mathbb{P}(B^c)}
\text{.}
\tag{6.3}\]
Exercise 6.1 Let’s continue from the Exercise 3.1 and let’s say that we observe \(X(\omega) = \{+1\}\), then we ask ourselves, what is the probability that in the next extraction \(X(\omega) = \{0\}\)?
Solution 6.1. The chances that with 52 cards we obtain \(X(\omega) = \{0\}\) is approximately \(\frac{12}{52} \approx 23.08 \%\) (see Solution 3.2), while \(X(\omega) = \{1\}\) is approximately \(\frac{20}{52} \approx 38.46 \%\).
Hence, if we have extracted a card that originates an \(X(\omega) = \{+1\}\), then in the deck remain only 19 possible cards that gives \(\{+1\}\) while the number of total cards reduces to 51. Thus, the conditional probability on another 1 decreases \[ \mathbb{P}(X(\omega_2) = \{+1\} \mid X(\omega_1) = \{+1\}) = \frac{19}{51} = 37.25\% \text{.} \] On the other hand, the conditional probability of a 0 increases \[ \mathbb{P}(X(\omega_2) = \{0\} \mid X(\omega_1) = \{+1\}) = \frac{12}{51} = 23.52 \% \text{.} \]
Proposition 6.2 (\(\color{magenta}{\textbf{Conditional probability and independent events}}\))
If two events \(A\) and \(B\) are independent and \(\mathbb{P}(B) > 0\), then \[
A \perp B \iff \mathbb{P}(A \mid B) = \mathbb{P}(A)
\text{.}
\]
Theorem 6.2 (\(\color{magenta}{\textbf{Bayes' Theorem}}\))
Let’s consider a partition of disjoint events \(\{A_1, \dots, A_n\}\) with each \(A_n \subset \Omega\) and such that \({\color{red}\sqcup}_{i=1}^n A_i = \Omega\). Then, given any event \(B \subset \Omega\) with probability greater than zero, \(\mathbb{P}(B) > 0\). Then, for any \(j \in \{1,2,\dots,n\}\) the conditional probability of the event \(A_j\) given \(B\) is defined as: \[
\mathbb{P}(A_j \mid B) = \frac{\mathbb{P}(B \mid A_j) \mathbb{P}(A_j)}{\sum_{i = 1}^{n} \mathbb{P}(B \mid A_i) \mathbb{P}(A_i)}
\text{.}
\]
Example 6.1 Let’s consider two random variables \(X(\omega)\) and \(Y(\omega)\) taking values in \(\{0,1\}\). The marginal probabilities \(\mathbb{P}(X = 0) = 0.6\) and \(\mathbb{P}(Y = 0) = 0.29\). Let’s consider the matrix of joint events and probabilities, i.e. \[ \begin{pmatrix} [X = 0] {\color{blue}{\cap}} [Y = 0] & [X = 0] {\color{blue}{\cap}} [Y = 1] \\ [X = 1] {\color{blue}{\cap}} [Y = 0] & [X = 1] {\color{blue}{\cap}} [Y = 1] \end{pmatrix} \overset{\mathbb{P}}{\longrightarrow} \begin{pmatrix} 0.17 & 0.43 \\ 0.12 & 0.28 \end{pmatrix} \text{.} \] Then, by definition the conditional probabilities are defined as: \[ \mathbb{P}(X = 0 \mid Y = 0) = \frac{\mathbb{P}(X = 0 {\color{blue}{\cap}} Y = 0)}{\mathbb{P}(Y = 0)} = \frac{0.17}{0.29} \approx 58.63 \% \text{,} \] and \[ \mathbb{P}(X = 0 \mid Y = 1) = \frac{\mathbb{P}(X = 0 {\color{blue}{\cap}} Y = 1)}{\mathbb{P}(Y = 1)} = \frac{0.43}{1-0.29} \approx 60.56 \% \text{.} \] Considering \(Y\) instead: \[ \mathbb{P}(Y = 0 \mid X = 0) = \frac{\mathbb{P}(Y = 0 {\color{blue}{\cap}} X = 0)}{\mathbb{P}(X = 0)} = \frac{0.17}{0.6} \approx 28.33 \% \text{,} \] and \[ \mathbb{P}(Y = 0 \mid X = 1) = \frac{\mathbb{P}(Y = 0 {\color{blue}{\cap}} X = 1)}{\mathbb{P}(X = 1)} = \frac{0.12}{1-0.6} \approx 30 \% \text{,} \] Then, it is possible to express the marginal probability of \(X\) as: \[ \begin{aligned} \mathbb{P}(X = 0) & {} = \mathbb{E}\{\mathbb{P}(X = 0 \mid Y)\} = \\ & = \mathbb{P}(X = 0 \mid Y = 0) \mathbb{P}(Y = 0) + \mathbb{P}(X = 0 \mid Y = 1) \mathbb{P}(Y = 1) = \\ & = 0.5863 \cdot 0.29 + 0.6056 \cdot (1 - 0.29) \approx 60 \% \end{aligned} \] And similarly for \(Y\) \[ \begin{aligned} \mathbb{P}(Y = 0) & {} = \mathbb{E}\{\mathbb{P}(Y = 0 \mid X)\} = \\ & = \mathbb{P}(Y = 0 \mid X = 0) \mathbb{P}(X = 0) + \mathbb{P}(Y = 0 \mid X = 1) \mathbb{P}(X = 1) = \\ & = 0.2833 \cdot 0.6 + 0.30 \cdot (1 - 0.6) \approx 29 \% \end{aligned} \]
Exercise 6.2 (\(\color{magenta}{\textbf{Monty Hall Problem}}\))
You are on a game show where there are three closed doors: behind one is a car (the prize) and behind the other two are goats.
The rules are simple:
- You choose one door (say Door 1).
- The host, who knows where the car is, opens one of the other doors, always revealing a goat.
- You are then offered the chance to stay with your original choice or switch to the other unopened door.
Question: Is in your interests to switch door? (See 21 Blackjack)
Solution 6.2. Before any door is opened, the probability that the car is behind each door is \[ \mathbb{P}(\text{car behind door 1}) = \mathbb{P}(\text{car behind door 2}) = \mathbb{P}(\text{car behind door 3}) = \frac{1}{3} = 33.\bar{3} \% \text{.} \] Suppose you picked Door 1. The conductor opens (say) Door 3, revealing a goat. Now the conditional probabilities are:
If the car is behind Door 1: Monty could open either Door 2 or Door 3 with equal probability.
If the car is behind Door 2: Monty is forced to open Door 3.
If the car is behind Door 3: Monty is forced to open Door 2 (so this case is impossible if Monty opens Door 3).
Apply Bayes’ Rule: \[ \mathbb{P}(\text{car behind door 1} \mid \text{Monty opens door 3}) = \frac{\tfrac{1}{3}\cdot \tfrac{1}{2}}{\tfrac{1}{3}\cdot\tfrac{1}{2}+\tfrac{1}{3}\cdot 1} = \frac{1/6}{1/6+1/3} = \tfrac{1}{3} = 33.\bar{3} \% \text{.} \] On the other hand, the other door has probabvility of winning of \[ \mathbb{P}(\text{car behind door 2} \mid \text{Monty opens door 3}) = \frac{\tfrac{1}{3}\cdot 1}{\tfrac{1}{3}\cdot\tfrac{1}{2}+\tfrac{1}{3}\cdot 1} = \frac{1/3}{1/6+1/3} = \tfrac{2}{3} = 66.\bar{6} \% \text{.} \] After Monty opens a goat door, the probability the car is behind your original choice is still 33\(\%\), while the probability it is behind the other unopened door is 66\(\%\), almost double . Therefore, switching doubles the chances of winning.
6.2.1 Conditional variance
Proposition 6.3 (\(\color{magenta}{\textbf{Conditional variance}}\))
Let’s consider two random variable \(X\) and \(Y\) with finite second moment. Then, the total variance can be expressed as: \[
\mathbb{V}\{X\} = \mathbb{E}\{\mathbb{V}\{X \mid Y\}\} + \mathbb{V}\{\mathbb{E}\{X \mid Y\}\}
\iff
\mathbb{V}\{Y\} = \mathbb{E}\{\mathbb{V}\{Y \mid X\}\} + \mathbb{V}\{\mathbb{E}\{Y \mid X\}\}
\]