6  Conditional expectation

Theorem 6.1 (Radom Nikodym)
Consider a measure space (Ω,B) and two measures μ, ν such that μ is σ-finite () and μ<<ν (). Then there exists a measurable function X:ΩR such that: μ(B)=AXdνBB

Definition 6.1 (Conditional expectation)
Given a probability space (Ω,B,P), consider G as a sub σ-field of B, i.e. GB. Let’s consider a random variable X:ΩR with finite expectation E{|X|}<+. We define a conditional expectation for X given G, any random variable Y=E{X|G} such that:

  1. Y has finite expectation, i.e. E{|Y|}<+.
  2. Y is G-measurable.
  3. E{1AY}=E{1AX}, AG, namely if X and Y are restricted to AG, then their expectation coincides.

A σ-field can be used to describe our state of information. It means that, AG we already know if the event A has occurred or not. Therefore, when we insert in G the events that we know were already occurred, we are saying that the random variable Y is G-measurable, i.e. the value of Y is not stochastic once we know the information contained in G. Moreover, the random variable Y=E{X|G} represent a prediction of the random variable X, given the information contained in the sub σ-field G.

Definition 6.2 (Predictor)
Consider Z any G-measurable random variable. Then Z can be interpreted as a predictor of another random variable X under the information contained in the σ-field G. However, when we substitute X with its prediction, namely Z, we make an error given by the difference XZ. In the special case in which E{|Z|2}<, we can take as error function the mean squared error, i.e. E{error2}=E{(XZ)2} We say that the conditional expectation E{X|G} is the best predictor in the sense that: E{(XE{X|G})2}=minZZE{(XZ)2} Hence, E{X|G} is the best predictor that minimize the mean squared error over the class Z composed by G-measurable functions with finite second moment, formally
Z={ZG-measurableandE{|Z|2}<}

6.1 Properties of conditional expectation

Here we state some useful properties of conditional expectation:

  1. Linearity: E{aX+bY|G}=aE{X|G}+bE{Y|G}, for all constants a,bR.
  2. Positive: X0E{X|G}0.
  3. Measurability: If Y is G-measurable, then E{XY|G}=YE{X|G}.
  4. Constant: E{a|G}=aaR. In general, if X is G-measurable then E{X|G}=X, i.e. is not stochastic.
  5. Independence: If X is independent from the σ-field G, then E{X|G}=E{X}.
  6. Chain rule: consider two two sub σ-fields of B such that G1G2, then we can write: E{X|G1}=E{E{X|G2}|G1} Remember that, when using this property it is mandatory to take the conditional expectation before with respect to the greatest σ-field, i.e. the one that contains more information (in this case G2), and then with respect to the smallest one (in this case G1).

6.2 Conditional probability

Definition 6.3 (Conditional probability)
Given a probability space (Ω,F,P), consider G as a sub σ-field of F, i.e. GF. Then the general definition of the conditional probability of an event A given G is: (6.1)P(A|G)=E(1A|G) Instead, the elementary definition do not consider the conditioning with respect to a σ-field, but instead with respect to a single event B. In practice, take an event BF such that 0<P(B)<1, then AF the conditional probability of A given B is defined as: (6.2)P(A|B)=P(AB)P(B),P(A|Bc)=P(ABc)P(Bc)

The elementary () and the general () definitions are equivalent, in fact consider a sub σ-field G which provides only the information concerning whenever ω is in B or not. A σ-field of this kind will have the form GB={Ω,,B,Bc}. Then, consider a GB-measurable function, f:ΩR, such that: f(ω)={αωBβωBc It remains to find α and β in the following expression: P(A|GB)=E{1A|GB}=α1B+β1Bc Note that, the joint probability of A and B can be obtained as: P(AB)=E{1A1B}=E{E{1A1B|GB}}=E{E{1A|GB}1B}==E{P(A|GB)1B}==E{(α1B+β1Bc)1B}==αE{1B}+βE{1Bc1B}==αP(B) Hence, we obtain: P(AB)=α P(B)α=P(AB)P(B) Equivalently for P(ABc) it is possible to prove that: P(ABc)=β P(Bc)β=P(ABc)P(Bc)

Finally it is possible to write the conditional probability in the general definition as a linear combination of conditional probabilities defined accordingly to the elementary one, i.e.  P(A|GB)=P(A|B)1B+P(A|Bc)1Bc

Example 6.1 Let’s continue from the example , let’s say that we observe X(ω)={+1}, then we ask ourselves, what is the probability that in the next extraction X(ω)={0}? The chances that with 52 cards we obtain X(ω)={0} is approximately 31323.08% (see ). Then, given the fact that the extracted card originates X(ω)={+1} we have that the probability, conditional to the fact that in the first extraction we had a card {+1}, that in the next extraction we have {0} is 1251=23.52%. Let’s now investigate the chances that in the next extraction X(ω)={+1} given that in the previous was {+1}. The unconditional probability is 205238.46%, the conditional probability will be 195137.25%.

Example 6.2 Let’s consider two random variables X(ω) and Y(ω) taking values in {0,1}. The marginal probabilities P(X=0)=0.6 and P(Y=0)=0.29. Let’s consider the matrix of joint events and probabilities, i.e.  ([X=0][Y=0][X=0][Y=1][X=1][Y=0][X=1][Y=1])P(0.170.430.120.28) Then, by definition the conditional probabilities are defined as: P(X=0|Y=0)=P(X=0Y=0)P(Y=0)=0.170.2958.63% and P(X=0|Y=1)=P(X=0Y=1)P(Y=1)=0.4310.2960.56% Considering Y instead: P(Y=0|X=0)=P(Y=0X=0)P(X=0)=0.170.628.33% and P(Y=0|X=1)=P(Y=0X=1)P(X=1)=0.1210.630% Then, it is possible to express the marginal probability of X as: P(X=0)=E{P(X=0|Y)}==P(X=0|Y=0)P(Y=0)+P(X=0|Y=1)P(Y=1)==0.58630.29+0.6056(10.29)60% And similarly for Y P(Y=0)=E{P(Y=0|X)}==P(Y=0|X=0)P(X=0)+P(Y=0|X=1)P(X=1)==0.28330.6+0.30(10.6)29%