6 Conditional expectation
Reference: Pietro Rigo (2023)
Theorem 6.1 (Radon Nikodym) Consider a measure space \((\Omega, \mathcal{B})\) and two sigma-finite measures \(\mu\), \(\nu\) such that \(\mu << \nu\) (Definition 30.1). Then there exists a measurable function \(X: \Omega \to \mathbb{R}\) such that: \[ \mu(B) = \int_B X d\nu \quad \forall B \in \mathcal{B} \]
Definition 6.1 (Conditional expectation) Given a probability space \((\Omega, \mathcal{B}, \mathbb{P})\). Let’s consider a sub-sigma-field of \(\mathcal{B}\), i.e. \(\mathcal{G} \subset \mathcal{B}\), and a random variable \(X: \Omega \rightarrow \mathbb{R}\) with finite expectation \(\mathbb{E}\{|X|\} < + \infty\). Then, the conditional expectation of \(X\) given \(\mathcal{G}\) is any random variable \[ Z = \mathbb{E}\{X \mid \mathcal{G}\} \text{,} \] such that:
- \(Z\) has finite expectation, i.e. \(\mathbb{E}\{|Z|\} < + \infty\).
- \(Z\) is \(\mathcal{G}\)-measurable.
- \(\mathbb{E}\{\mathbb{1}_{A} Z \} = \mathbb{E}\{\mathbb{1}_{A}X\}\), \(\forall A \in \mathcal{G}\), namely if \(X\) and \(Z\) are restricted to \(A \in \mathcal{G}\), then their expectation coincides.
A sigma-field can be used to describe our state of information. It means that, \(\forall A \in \mathcal{G}\), we already know if the event \(A\) has occurred or not. Therefore, when we insert in \(\mathcal{G}\) the events that we know have already occurred, we are saying that the random variable \(Z\) is \(\mathcal{G}\)-measurable, i.e. the value of \(Z\) is not stochastic once we know the information contained in \(\mathcal{G}\). In this context, one can see the random variable \(Y = \mathbb{E}\{X \mid \mathcal{G}\}\) as the prediction of \(X\), given the information contained in the sub-sigma-field \(\mathcal{G}\).
Definition 6.2 (Predictor) Consider \(Z\) any \(\mathcal{G}\)-measurable random variable. Then \(Z\) can be interpreted as a predictor of another random variable \(X\) under the information contained in the sigma-field \(\mathcal{G}\). When we substitute \(X\) with its prediction \(Z\), we make an error given by the difference \(X - Z\). In the special case in which \(\mathbb{E}\{|Z|^2\} < \infty\) and using as error function the mean squared error, i.e. \[ \mathbb{E}\{\text{error}^2\} = \mathbb{E}\{(X - Z)^2\} \text{,} \] then it is possible to prove that the conditional expectation \(\mathbb{E}\{X \mid \mathcal{G}\}\) represents the best predictor of \(X\) in the sense that it minimizes the mean squared error, i.e. \[ \mathbb{E}\{(X - \mathbb{E}\{X \mid \mathcal{G}\})^2\} = \underset{\small Z \in \mathcal{Z}}{\min}\left[\mathbb{E}\{(X - Z)^2\}\right] \text{.} \] Hence, \(\mathbb{E}\{X|\mathcal{G}\}\) is the best predictor that minimizes the mean squared error over the class \(\mathcal{Z}\) composed of \(\mathcal{G}\)-measurable functions with finite second moment, i.e. \[ Z = \mathbb{E}\{X \mid \mathcal{G}\} = \underset{\small Z \in \mathcal{Z}}{\text{argmin}}\left[\mathbb{E}\{(X - Z)^2\}\right] \text{,} \] where \(\mathcal{Z} = \{Z : Z \text{ is } \mathcal{G}\text{-measurable} \; \text{and} \; \mathbb{E}\{|Z|^2\} < \infty \}\).
6.1 Properties of conditional expectation
Here we state some useful properties of conditional expectation:
- Linearity: The conditional expectation is linear for all constants \(a, b \in \mathbb{R}\), i.e. \[ \mathbb{E}\{ a X + b Y \mid \mathcal{G} \} = a \mathbb{E}\{X \mid \mathcal{G} \} + b \mathbb{E}\{Y \mid \mathcal{G}\} \text{.} \]
- Positive: \(X \ge 0\) implies that \(\mathbb{E}\{X \mid \mathcal{G}\} \ge 0\).
- Measurability: If \(Y\) is \(\mathcal{G}\)-measurable, then \(\mathbb{E}\{XY \mid \mathcal{G}\} = Y \mathbb{E}\{X \mid \mathcal{G}\}\). In general, if \(X\) is \(\mathcal{G}\)-measurable then \(\mathbb{E}\{X \mid \mathcal{G}\} = X\), i.e. \(X\) is not stochastic.
- Constant: The conditional expectation of a constant is a constant, i.e. \[ \mathbb{E}\{a \mid \mathcal{G}\} = a \; \; \forall a \in \mathbb{R} \text{.} \]
- Independence: If \(X\) is independent from the sigma-field \(\mathcal{G}\), then \(\mathbb{E}\{X \mid \mathcal{G}\} = \mathbb{E}\{X\}\).
- Chain rule: If one considers two sub-sigma-fields of \(\mathcal{B}\) such that \(\mathcal{G_1} \subset \mathcal{G_2}\), then we can write: \[ \mathbb{E}\{X \mid \mathcal{G_1}\} = \mathbb{E}\{\mathbb{E}\{X \mid \mathcal{G_2}\} \mid \mathcal{G_1}\} \iff \mathcal{G_1} \subset \mathcal{G_2} \text{.} \tag{6.1}\] Remember that, when using the chain rule, it is mandatory to take the conditional expectation first with respect to the greatest sigma-field, i.e. the one that contains more information (in this case \(\mathcal{G_2}\)), and then with respect to the smallest one (in this case \(\mathcal{G_1}\)).
6.2 Conditional probability
Proposition 6.1 (Conditional probability) Given a probability space \((\Omega, \mathcal{F}, \mathbb{P})\), consider \(\mathcal{G}\) as a sub sigma-field of \(\mathcal{F}\), i.e. \(\mathcal{G} \subset \mathcal{F}\). Then the general definition of the conditional probability of an event \(A\), given \(\mathcal{G}\), is \[ \mathbb{P}(A \mid \mathcal{G}) = \mathbb{E}(\mathbb{1}_A \mid \mathcal{G}) \text{.} \tag{6.2}\] Instead, the elementary definition does not consider conditioning with respect to a sigma-field, but with respect to a single event \(B\). In practice, take an event \(B \in \mathcal{F}\) such that \(0 < \mathbb{P}(B) < 1\); then \(\forall A \in \mathcal{F}\) the conditional probability of \(A\) given \(B\) is defined as: \[ \mathbb{P}(A \mid B) = \frac{\mathbb{P}(A {\color{blue}{\cap}} B)}{\mathbb{P}(B)} \text{,} \quad \mathbb{P}(A \mid B^c) = \frac{\mathbb{P}(A {\color{blue}{\cap}} B^c)}{\mathbb{P}(B^c)} \text{.} \tag{6.3}\]
Exercise 6.1 Let’s continue from the Exercise 3.1 and let’s say that we observe \(X(\omega) = \{+1\}\), then we ask ourselves, what is the probability that in the next extraction \(X(\omega) = \{0\}\)?
Solution 6.1. The chance that, with 52 cards, we obtain \(X(\omega) = \{0\}\) is approximately \(\frac{12}{52} \approx 23.08 \%\) (see Solution 3.2), while \(X(\omega) = \{1\}\) is approximately \(\frac{20}{52} \approx 38.46 \%\).
Hence, if we have extracted a card that produces \(X(\omega) = \{+1\}\), then only 19 possible cards that give \(\{+1\}\) remain in the deck, while the total number of cards reduces to 51. Thus, the conditional probability of drawing another 1 decreases \[ \mathbb{P}(X(\omega_2) = \{+1\} \mid X(\omega_1) = \{+1\}) = \frac{19}{51} = 37.25\% \text{.} \] On the other hand, the conditional probability of a 0 increases \[ \mathbb{P}(X(\omega_2) = \{0\} \mid X(\omega_1) = \{+1\}) = \frac{12}{51} = 23.52 \% \text{.} \]
Proposition 6.2 (Conditional probability and independent events) If two events \(A\) and \(B\) are independent and \(\mathbb{P}(B) > 0\), then \[ A \perp B \iff \mathbb{P}(A \mid B) = \mathbb{P}(A) \text{.} \]
Theorem 6.2 (Bayes’ Theorem) Let’s consider a partition of disjoint events \(\{A_1, \dots, A_n\}\) with each \(A_n \subset \Omega\) and such that \({\color{red}\sqcup}_{i=1}^n A_i = \Omega\). Given any event \(B \subset \Omega\) with probability greater than zero, \(\mathbb{P}(B) > 0\), for any \(j \in \{1,2,\dots,n\}\) the conditional probability of the event \(A_j\) given \(B\) is defined as: \[ \mathbb{P}(A_j \mid B) = \frac{\mathbb{P}(B \mid A_j) \mathbb{P}(A_j)}{\sum_{i = 1}^{n} \mathbb{P}(B \mid A_i) \mathbb{P}(A_i)} \text{.} \]
Example 6.1 Let’s consider two random variables \(X(\omega)\) and \(Y(\omega)\) taking values in \(\{0,1\}\). The marginal probabilities \(\mathbb{P}(X = 0) = 0.6\) and \(\mathbb{P}(Y = 0) = 0.29\). Let’s consider the matrix of joint events and probabilities, i.e. \[ \begin{pmatrix} [X = 0] {\color{blue}{\cap}} [Y = 0] & [X = 0] {\color{blue}{\cap}} [Y = 1] \\ [X = 1] {\color{blue}{\cap}} [Y = 0] & [X = 1] {\color{blue}{\cap}} [Y = 1] \end{pmatrix} \overset{\mathbb{P}}{\longrightarrow} \begin{pmatrix} 0.17 & 0.43 \\ 0.12 & 0.28 \end{pmatrix} \text{.} \] Then, by definition the conditional probabilities are defined as: \[ \mathbb{P}(X = 0 \mid Y = 0) = \frac{\mathbb{P}(X = 0 {\color{blue}{\cap}} Y = 0)}{\mathbb{P}(Y = 0)} = \frac{0.17}{0.29} \approx 58.63 \% \text{,} \] and \[ \mathbb{P}(X = 0 \mid Y = 1) = \frac{\mathbb{P}(X = 0 {\color{blue}{\cap}} Y = 1)}{\mathbb{P}(Y = 1)} = \frac{0.43}{1-0.29} \approx 60.56 \% \text{.} \] Considering \(Y\) instead: \[ \mathbb{P}(Y = 0 \mid X = 0) = \frac{\mathbb{P}(Y = 0 {\color{blue}{\cap}} X = 0)}{\mathbb{P}(X = 0)} = \frac{0.17}{0.6} \approx 28.33 \% \text{,} \] and \[ \mathbb{P}(Y = 0 \mid X = 1) = \frac{\mathbb{P}(Y = 0 {\color{blue}{\cap}} X = 1)}{\mathbb{P}(X = 1)} = \frac{0.12}{1-0.6} \approx 30 \% \text{,} \] Then, it is possible to express the marginal probability of \(X\) as: \[ \begin{aligned} \mathbb{P}(X = 0) & {} = \mathbb{E}\{\mathbb{P}(X = 0 \mid Y)\} = \\ & = \mathbb{P}(X = 0 \mid Y = 0) \mathbb{P}(Y = 0) + \mathbb{P}(X = 0 \mid Y = 1) \mathbb{P}(Y = 1) = \\ & = 0.5863 \cdot 0.29 + 0.6056 \cdot (1 - 0.29) \approx 60 \% \end{aligned} \] And similarly for \(Y\) \[ \begin{aligned} \mathbb{P}(Y = 0) & {} = \mathbb{E}\{\mathbb{P}(Y = 0 \mid X)\} = \\ & = \mathbb{P}(Y = 0 \mid X = 0) \mathbb{P}(X = 0) + \mathbb{P}(Y = 0 \mid X = 1) \mathbb{P}(X = 1) = \\ & = 0.2833 \cdot 0.6 + 0.30 \cdot (1 - 0.6) \approx 29 \% \end{aligned} \]
Exercise 6.2 You are on a game show where there are three closed doors: behind one is a car (the prize) and behind the other two are goats.
The rules are simple:
- You choose one door (say Door 1).
- The host, who knows where the car is, opens one of the other doors, always revealing a goat.
- You are then offered the chance to stay with your original choice or switch to the other unopened door.
Question: Is it in your interest to switch doors? (See 21 Blackjack)
Solution 6.2. Before any door is opened, the probability that the car is behind each door is \[ \mathbb{P}(\text{car behind door 1}) = \mathbb{P}(\text{car behind door 2}) = \mathbb{P}(\text{car behind door 3}) = \frac{1}{3} = 33.\bar{3} \% \text{.} \] Suppose you picked Door 1. The conductor opens (say) Door 3, revealing a goat. Now the conditional probabilities are:
If the car is behind Door 1: Monty could open either Door 2 or Door 3 with equal probability.
If the car is behind Door 2: Monty is forced to open Door 3.
If the car is behind Door 3: Monty is forced to open Door 2 (so this case is impossible if Monty opens Door 3).
Apply Bayes’ Rule: \[ \mathbb{P}(\text{car behind door 1} \mid \text{Monty opens door 3}) = \frac{\tfrac{1}{3}\cdot \tfrac{1}{2}}{\tfrac{1}{3}\cdot\tfrac{1}{2}+\tfrac{1}{3}\cdot 1} = \frac{1/6}{1/6+1/3} = \tfrac{1}{3} = 33.\bar{3} \% \text{.} \] On the other hand, the other door has probability of winning equal to \[ \mathbb{P}(\text{car behind door 2} \mid \text{Monty opens door 3}) = \frac{\tfrac{1}{3}\cdot 1}{\tfrac{1}{3}\cdot\tfrac{1}{2}+\tfrac{1}{3}\cdot 1} = \frac{1/3}{1/6+1/3} = \tfrac{2}{3} = 66.\bar{6} \% \text{.} \] After Monty opens a goat door, the probability the car is behind your original choice is still 33\(\%\), while the probability it is behind the other unopened door is 66\(\%\), almost double. Therefore, switching doubles the chances of winning.
6.2.1 Conditional variance
Proposition 6.3 (Conditional variance) Let’s consider two random variables \(X\) and \(Y\) with finite second moment. Then, the total variance can be expressed as: \[ \mathbb{V}\{X\} = \mathbb{E}\{\mathbb{V}\{X \mid Y\}\} + \mathbb{V}\{\mathbb{E}\{X \mid Y\}\} \iff \mathbb{V}\{Y\} = \mathbb{E}\{\mathbb{V}\{Y \mid X\}\} + \mathbb{V}\{\mathbb{E}\{Y \mid X\}\} \]