6  Conditional expectation

Reference: Pietro Rigo ()

Theorem 6.1 (Radom Nikodym)
Consider a measure space (Ω,B) and two measures μ, ν such that μ is σ-finite () and μ<<ν (). Then there exists a measurable function X:ΩR such that: μ(B)=AXdνBB

Definition 6.1 (Conditional expectation)
Given a probability space (Ω,B,P). Let’s consider a sub σ-field of B, i.e. GB and a random variable X:ΩR with finite expectation E{|X|}<+. Then, the conditional expectation of X given G is any random variable Z=E{XG}, such that:

  1. Z has finite expectation, i.e. E{|Y|}<+.
  2. Z is G-measurable.
  3. E{1AZ}=E{1AX}, AG, namely if X and Z are restricted to AG, then their expectation coincides.

A σ-field can be used to describe our state of information. It means that, AG we already know if the event A has occurred or not. Therefore, when we insert in G the events that we know were already occurred, we are saying that the random variable Z is G-measurable, i.e. the value of Z is not stochastic once we know the information contained in G. In this context, one can see the random variable Y=E{XG} as the prediction of X, given the information contained in the sub σ-field G.

Definition 6.2 (Predictor)
Consider Z any G-measurable random variable. Then Z can be interpreted as a predictor of another random variable X under the information contained in the σ-field G. When we substitute X with its prediction Z, we make an error given by the difference XZ. In the special case in which E{|Z|2}< and using as error function the mean squared error, i.e.  E{error2}=E{(XZ)2}, then it is possible to prove that the conditional expectation E{XG} represent the best predictor of X in the sense that it minimized the mean squared error, i.e.  E{(XE{XG})2}=minZZ[E{(XZ)2}]. Hence, E{X|G} is the best predictor that minimize the mean squared error over the class Z composed by G-measurable functions with finite second moment, i.e. Z=E{XG}=argminZZ[E{(XZ)2}], where Z={Z:Z is G-measurableandE{|Z|2}<}.

6.1 Properties of conditional expectation

Here we state some useful properties of conditional expectation:

  1. Linearity: The conditional expectation is linear for all all constants a,bR, i.e.  E{aX+bYG}=aE{XG}+bE{YG}.
  2. Positive: X0 implies that E{XG}0.
  3. Measurability: If Y is G-measurable, then E{XYG}=YE{XG}. In general, if X is G-measurable then E{XG}=X, i.e. X is not stochastic.
  4. Constant: The conditional expectation of a constant is a constant, i.e.  E{aG}=aaR.
  5. Independence: If X is independent from the σ-field G, then E{XG}=E{X}.
  6. Chain rule: If one consider two sub σ-fields of B such that G1G2, then we can write: (6.1)E{XG1}=E{E{XG2}G1}G1G2. Remember that, when using the chain rule it is mandatory to take the conditional expectation before with respect to the greatest σ-field, i.e. the one that contains more information (in this case G2), and then with respect to the smallest one (in this case G1).

6.2 Conditional probability

Proposition 6.1 (Conditional probability)
Given a probability space (Ω,F,P), consider G as a sub σ-field of F, i.e. GF. Then the general definition of the conditional probability of an event A, given G, is (6.2)P(AG)=E(1AG). Instead, the elementary definition do not consider the conditioning with respect to a σ-field, but instead with respect to a single event B. In practice, take an event BF such that 0<P(B)<1, then AF the conditional probability of A given B is defined as: (6.3)P(AB)=P(AB)P(B),P(ABc)=P(ABc)P(Bc).

Proof. The elementary () and the general () definitions are equivalent, in fact consider a sub σ-field G which provides only the information concerning whenever ω is in B or not. A σ-field of this kind will have the form GB={Ω,,B,Bc}. Then, consider a GB-measurable function, f:ΩR, such that: f(ω)={αωBβωBc It remains to find α and β in the following expression: P(AGB)=E{1AGB}=α1B+β1Bc. Note that, the joint probability of A and B can be obtained as: P(AB)=E{1A1B}=E{E{1A1BGB}}==E{E{1AGB}1B}==E{P(AGB)1B}==E{(α1B+β1Bc)1B}==αE{1B}+βE{1Bc1B}==αP(B) Hence, we obtain: P(AB)=α P(B)α=P(AB)P(B). Equivalently for P(ABc) it is possible to prove that: P(ABc)=β P(Bc)β=P(ABc)P(Bc). Finally, thanks to this result it is possible to express the conditional probability in the general definition () as a linear combination of conditional probabilities defined accordingly to the elementary definition (), i.e. P(AGB)=P(AB)1B+P(ABc)1Bc.

Exercise 6.1 Let’s continue from the and let’s say that we observe X(ω)={+1}, then we ask ourselves, what is the probability that in the next extraction X(ω)={0}?

Solution 6.1. The chances that with 52 cards we obtain X(ω)={0} is approximately 125223.08% (see ), while X(ω)={1} is approximately 205238.46%.

Hence, if we have extracted a card that originates an X(ω)={+1}, then in the deck remain only 19 possible cards that gives {+1} while the number of total cards reduces to 51. Thus, the conditional probability on another 1 decreases P(X(ω2)={+1}X(ω1)={+1})=1951=37.25%. On the other hand, the conditional probability of a 0 increases P(X(ω2)={0}X(ω1)={+1})=1251=23.52%.

Proposition 6.2 (Conditional probability and independent events)
If two events A and B are independent and P(B)>0, then ABP(AB)=P(A).

Proof. To prove the result in we consider both sides of the expression.

  1. (LHS RHS) Let’s assume that A and B are independent, i.e. AB. Then, by definition () under independence holds the decomposition in , i.e. P(AB)=P(A)P(B)P(AB)=P(AB)P(B)=P(A).

  2. (RHS LHS) Let’s assume that the following holds, i.e. P(AB)=P(A) and P(B)>0. Then, by definition of conditional probability () the joint probability is defined as: P(AB)=P(AB)P(B)=P(A)P(B), that is exactly the definition of independence ().

Theorem 6.2 (Bayes’ Theorem)
Let’s consider a partition of disjoint events {A1,,An} with each AnΩ and such that i=1nAi=Ω. Then, given any event BΩ with probability greater than zero, P(B)>0. Then, for any j{1,2,,n} the conditional probability of the event Aj given B is defined as: P(AjB)=P(BAj)P(Aj)i=1nP(BAi)P(Ai).

Example 6.1 Let’s consider two random variables X(ω) and Y(ω) taking values in {0,1}. The marginal probabilities P(X=0)=0.6 and P(Y=0)=0.29. Let’s consider the matrix of joint events and probabilities, i.e.  ([X=0][Y=0][X=0][Y=1][X=1][Y=0][X=1][Y=1])P(0.170.430.120.28). Then, by definition the conditional probabilities are defined as: P(X=0Y=0)=P(X=0Y=0)P(Y=0)=0.170.2958.63%, and P(X=0Y=1)=P(X=0Y=1)P(Y=1)=0.4310.2960.56%. Considering Y instead: P(Y=0X=0)=P(Y=0X=0)P(X=0)=0.170.628.33%, and P(Y=0X=1)=P(Y=0X=1)P(X=1)=0.1210.630%, Then, it is possible to express the marginal probability of X as: P(X=0)=E{P(X=0Y)}==P(X=0Y=0)P(Y=0)+P(X=0Y=1)P(Y=1)==0.58630.29+0.6056(10.29)60% And similarly for Y P(Y=0)=E{P(Y=0X)}==P(Y=0X=0)P(X=0)+P(Y=0X=1)P(X=1)==0.28330.6+0.30(10.6)29%

Exercise 6.2 (Monty Hall Problem)
You are on a game show where there are three closed doors: behind one is a car (the prize) and behind the other two are goats.

The rules are simple:

  1. You choose one door (say Door 1).
  2. The host, who knows where the car is, opens one of the other doors, always revealing a goat.
  3. You are then offered the chance to stay with your original choice or switch to the other unopened door.

Question: Is in your interests to switch door? (See 21 Blackjack)

Solution 6.2. Before any door is opened, the probability that the car is behind each door is P(car behind door 1)=P(car behind door 2)=P(car behind door 3)=13=33.3¯%. Suppose you picked Door 1. The conductor opens (say) Door 3, revealing a goat. Now the conditional probabilities are:

  • If the car is behind Door 1: Monty could open either Door 2 or Door 3 with equal probability.

  • If the car is behind Door 2: Monty is forced to open Door 3.

  • If the car is behind Door 3: Monty is forced to open Door 2 (so this case is impossible if Monty opens Door 3).

Apply Bayes’ Rule: P(car behind door 1Monty opens door 3)=13121312+131=1/61/6+1/3=13=33.3¯%. On the other hand, the other door has probabvility of winning of P(car behind door 2Monty opens door 3)=1311312+131=1/31/6+1/3=23=66.6¯%. After Monty opens a goat door, the probability the car is behind your original choice is still 33%, while the probability it is behind the other unopened door is 66%, almost double . Therefore, switching doubles the chances of winning.

6.2.1 Conditional variance

Proposition 6.3 (Conditional variance)
Let’s consider two random variable X and Y with finite second moment. Then, the total variance can be expressed as: V{X}=E{V{XY}}+V{E{XY}}V{Y}=E{V{YX}}+V{E{YX}}

Proof. By definition, the variance of a random variable reads: V{X}=E{X2}E{X}2. Applying the chain rule () one can write V{X}=E{E{X2Y}}E{E{XY}}2. Then, add and subtract E{E{XY}2} gives V{X}=E{E{X2Y}}E{E{XY}}2+E{E{XY}2}E{E{XY}2}. Group the first and fourth terms and the second and third to obtain V{X}=E{E{X2Y}E{XY}2}+E{E{XY}2}E{E{XY}}2, that simplifies as V{X}=E{V{XY}}+V{E{XY}}.