7  Moment generating functions

Last modified

June 4, 2026

Reference: Pietro Rigo (2023), Chapter 9. Resnick (2005).

Definition 7.1 (Moment Generating Function) Consider a one-dimensional random variable \(X\); then the moment generating function is defined as: \[ \psi_{X}(t) = \mathbb{E}\{e^{tX}\} \]

Proposition 7.1 (Moment generating function and sequence of moments) Consider a random variable \(X\), such that its moment generating function exists and is finite around zero, i.e. \[ \psi_{X}(t) = \mathbb{E}\{e^{tX}\} < \infty \quad \exists \epsilon > 0, \forall t \in (-\epsilon, \epsilon) \]

The result presented in Proposition 7.1 implies that the sequence of moments is finite, \(\mathbb{E}\{|X|^n\} < \infty\) for all \(n\), and that the sequence of moments uniquely determines the distribution of \(X\). According to the result, if we consider another random variable \(Y\) such that \(\mathbb{E}\{|X|^n\} = \mathbb{E}\{|Y|^n\}\) for all \(n\), then the distributions of \(X\) and \(Y\) are equal.

7.1 Characteristic functions

Definition 7.2 (Characteristic function) Consider an \(n\)-dimensional vector \(\mathbf{X}\), then the characteristic function is defined \(\forall t \in \mathbb{R}^{n}\) as: \[ \phi_{\mathbf{X}}(\mathbf{t}) = \mathbb{E}\{ e^{i \mathbf{t}^T \mathbf{X}}\} \quad \text{where} \quad \mathbf{t}^T \mathbf{X} = \begin{pmatrix}t_1, t_2, \dots, t_n\end{pmatrix} \begin{pmatrix} X_1 \\ X_2 \\ \vdots \\ X_n \end{pmatrix} \]

The characteristic function always exists when treated as a function of a real-valued argument, unlike the moment-generating function. The characteristic function uniquely determines the probability distribution of the corresponding random vector \(\mathbf{X}\). More precisely, saying that two random variables have the same distribution is equivalent to saying that their characteristic functions are equal. It follows that we can always work with characteristic functions to prove that two distributions of some random vectors are equal or that a distribution converges to another distribution. Formally, \[ F_{X} = F_{Y} \iff \phi_{X}(t) = \phi_{Y}(t) \text{,} \quad \forall t \in \mathbb{R} \text{.} \]

Here, we list some properties considering the random variable case, i.e. \(n = 1\), with \(t \in \mathbb{R}\).

  1. Independence: The characteristic function of the sum of \(n\) independent random variables, namely \(S_n = \sum_{i = 1}^{n} X_i\), is equal to the product of the individual characteristic functions, i.e. \[ X_1,\dots,X_n \text{ independent} \implies \phi_{S_n}(t) = \prod_{i = 1 }^{n}\phi_{X_i}(t) \text{,} \tag{7.1}\] for all \(t \in \mathbb{R}\).

  2. Existence of the \(j\)-th moment: If the \(j\)-th moment of the random variable is finite then the characteristic function is \(j\)-times differentiable and continuous in 0, i.e. \(\phi_{X} \in \mathbb{C}^{(j)}\). Formally, \[ \mathbb{E}\{|X|^r\} < \infty \implies \phi_{X} \in \mathbb{C}^{(r)} \quad \text{and} \quad \phi_{X}^{(r)} (t) = \mathbb{E}\{(i X)^{r} e^{itX}\} \quad r = 1,2,\dots, j \] For even orders, the converse also holds at the origin: \[ \mathbb{E}\{|X|^r\} < \infty \iff \phi_{X}^{(r)}(0) \text{ exists and is finite} \quad r = 2, 4, \dots, j \]

  3. Inversion theorem: The characteristic function \(\phi_X\) uniquely determines the probability distribution \(F_X\) of a random variable \(X\), i.e. \[ F_X(b) - F_X(a) = \frac{1}{2 \pi i} \lim_{c \to \infty} \int_{-c}^{c} \frac{e^{-ita} - e^{-itb}}{t} \phi_X(t) dt \quad \forall a < b \text{,} \] and the density function is obtained as: \[ f(x) = \frac{1}{2\pi} \int_{-\infty}^{\infty} e^{-itx} \phi_X(t)dt \text{.} \]

  4. Convergence in distribution: A sequence of random variables \(X_n\) converges in distribution to a random variable \(X\) if and only if the limit as \(n \to \infty\) of the characteristic function of \(X_n\) converges to the characteristic function of \(X\), i.e. \[ X_n \overset{\text{d}}{\underset{n \to \infty}{\longrightarrow}} X \iff \phi_{X}(t) = \lim_{n \to \infty} \phi_{X_n}(t) \quad \forall t \in \mathbb{R} \]

  5. Scaling and centering: Given \(Y = {\color{red}{a}} + {\color{blue}{b}} X\), the effect of scaling and centering on the characteristic function is such that: \[ \phi_{Y}(t) = \mathbb{E}\{e^{itY}\} = \mathbb{E}\{e^{it({\color{red}{a}} + {\color{blue}{b}} X)} \} = e^{it{\color{red}{a}}} \mathbb{E}\{e^{i(t{\color{blue}{b}}) X}\} = e^{it{\color{red}{a}}} \phi_{X}({\color{blue}{b}}t) \quad \forall {\color{red}{a}}, {\color{blue}{b}}, t \in \mathbb{R} \text{.} \]

  6. Weak Law of Large Numbers: Consider a sequence of independent and identically distributed random variables such that the first moment of \(X_1\) is finite, i.e. \(\mathbb{E}\{|X_1|\} < \infty\) and \(\mathbb{E}\{X_1\} = \mu\), then the sample mean converges in probability to the constant \(\mu\), i.e. \[ \bar{X}_n = \frac{1}{n}\sum_{i=1}^{n}X_i \underset{n\to\infty}{\overset{\text{p}}{\longrightarrow}} \mu \text{.} \]

Proof. To prove the property 6. let’s compute the characteristic function of the sample mean \(\bar{X}_n\), i.e. \[ \phi_{\bar{X}_n}(t) = \mathbb{E}\left\{\exp\left(\frac{it}{n}\sum_{i=1}^{n} X_i \right) \right\} \overset{\small\text{IID}}{=} \left[\phi_{X_1}\biggl(\frac{t}{n}\biggl)\right]^n \text{.} \] Applying the Taylor expansion till the first order term (Equation 29.1) on the function \(\phi_{X_1}\left(\frac{t}{n}\right)\) around zero \(x = \frac{t}{n}\) gives \[ \begin{aligned} \phi_{\bar{X}_n}(t) & {} = \left(\phi_{X_1}(0) + \frac{t}{n} \phi_{X_1}^{\prime}(0) + o\biggl(\frac{t}{n}\biggl) \right)^n = \\ & = \left(1 + \frac{t \phi_{X_1}^{\prime}(0) + n o(\frac{t}{n})}{n}\right)^n \underset{n \to\infty}{\longrightarrow} \exp\left\{t \phi_{X_1}^{\prime}(0)\right\} \end{aligned} \] Then, taking the limit as \(n\to\infty\) and recalling the well known result, i.e. \[ a_n \underset{n \to\infty}{\longrightarrow} a \implies \left(1 + \frac{a_n}{n}\right)^n \underset{n \to\infty}{\longrightarrow} e^a \text{,} \] one obtains, since \(\phi_{X_1}^{\prime}(0)=i\mu\), \[ \lim_{n\to\infty} \phi_{\bar{X}_n}(t) = e^{t \phi_{X_1}^{\prime}(0)} = e^{it \mu} \quad \forall t \in \mathbb{R} \text{.} \] Thus, since \(e^{it \mu} = \phi_{\mu}(t)\), where \(\phi_{\mu}\) is the characteristic function of a random variable that is almost surely equal to \(\mu\), the sample mean converges in distribution to the constant \(\mu\). Convergence in distribution to a constant implies convergence in probability.