9 Introduction

9.1 Population and Sample

A population refers to the entire group of individuals or instances about whom we hope to learn. It encompasses all possible subjects or observations that meet a set of criteria. The population is the complete set of items that interest the researcher, and it can be finite (e.g. the students in a particular school) or infinite (e.g. the number of times a die can be rolled). A population size is given by the number of distinct elements and it includes every individual or observation of interest.

A sample is a subset of the population that is used to represent the population. Since studying an entire population is often impractical due to constraints like time, cost, and accessibility, samples provide a manageable and efficient way to gather data and make inferences about the population. It is important that the sample is representative of the population of interest to allow for valid inferences. It is always important to distinguish between a random sample, e.g. a random group of students in 5th year from a school to make inference about the students at the 5th year of such school, and a convenience sample, e.g. a class of 5th year students who are easily accessible to the researcher, but that can be not representative of all the 5th year students in the school.

Aspect	Population	Sample
Definition	Entire group of interest	Subset of the population
Size	Large, potentially infinite	Small, manageable
Data Collection	Often impractical to study directly	Practical and feasible
Purpose	To understand the whole group	To make inferences about the population

9.2 Estimators

Let’s consider a statistical model depending on some unknown parameter $θ$ contained in the sample space of parameters $Θ$ , i.e. ${P_{θ} : θ \in Θ}$ . Then, given an observed sample from the statistical model, $X_{n} = (x_{1}, \dots, x_{n})$ , an estimator is a function that maps the sample space to a set of sample estimates. Formally, since $X_{n}$ is a collection of random variables, then also any estimator of $θ$ , as function of the sample, is a random variable, i.e. $X_{n} ⟶ θ (X_{n}) ⟶ \hat{θ} (X_{n}),$ where we input a sample $X_{n}$ to an estimator function $θ (X_{n})$ and we output an estimate $\hat{θ} (X_{n})$ . When we condition to a particular value of the sample $X_{n}$ , we obtain a point estimate of the true $θ$ , i.e. $\hat{θ} (X_{n} = (x_{1}, \dots, x_{n})) = \hat{θ},$ that is a number (or a vector).

Since the estimator is a random variable itself, one can define some metrics to compare different estimators of the same parameter. Firstly, let’s consider the bias, the distance between the average of the collection of estimates and the single parameter being estimated, i.e. $Bias {\hat{θ} (X_{n})} = E {\hat{θ} (X_{n})} - θ .$ We distinguish between two kind of estimators:

Biased, when $Bias {\hat{θ} (X_{n})} \neq 0$ .
Unbiased, when $Bias {\hat{θ} (X_{n})} = 0$ .

The variance is used to indicate how far the collection of estimates are from the expected value of the estimates, i.e. $V {\hat{θ} (X_{n})} = E {{(\hat{θ} (X_{n}) - E {\hat{θ} (X_{n})})}^{2}} .$ Finally, we have the Mean Squared Error (MSE) of an estimator, i.e. $MSE {\hat{θ} (X_{n})} = Bias {\hat{θ} (X_{n})}^{2} + V {\hat{θ} (X_{n})},$ where for an unbiased estimator, the mean squared error equals the variance.

9.2.1 Properties

As for some desirable property of an estimator one have

Consistency (weak): Definition 8.3 $\hat{θ} (X_{n}) \underset{n \to \infty}{\overset{P}{⟶}} θ$

Efficiency: Among unbiased estimators, one with minimal variance is called (finite-sample) efficient; asymptotically efficient means it attains the Cramér–Rao lower bound (CRLB) limit.

Asymptotic normality: for some normalizing sequence $a_{n}$ Definition 8.5 $a_{n} (\hat{θ} (X_{n}) - θ) \underset{n \to \infty}{\overset{d}{⟶}} N (0, σ^{2} (θ))$ for some normalizing sequence $a_{n}$ .

9.2.2 Sufficiency and Completeness

Theorem 9.1 A statistic $T = T (X_{n})$ is sufficient for $θ$ iff the joint likelihood factors as $ $f (X_{n} ∣ θ) = g (T (X_{n}) ∣ θ) h (X_{n})$

Theorem 9.2 Rao–Blackwell. If $T$ is sufficient and $\tilde{τ}$ is any estimator, then $\hat{τ} = E [\tilde{τ} ∣ T]$ has $Var (\hat{τ}) \leq Var (\tilde{τ})$ and the same mean.

A statistic $T$ is complete if $E_{θ} [g (T)] = 0$ for all $θ$ implies $g (T) = 0$ almost surely.

:::{#thm-lehmann–scheffe .thm} Lehmann–Scheffé. If $T$ is complete and sufficient and $\hat{τ} = E [\tilde{τ} ∣ T]$ is unbiased, then $\hat{τ}$ is the unique UMVU (uniformly minimum-variance unbiased) estimator of $τ (θ)$ . :::

9.2.3 Methods of construction

Method of Moments (MoM): Solve sample moments $m_{k} = \frac{1}{n} \sum X_{i}^{k}$ for parameters matching population moments.
Maximum Likelihood (MLE): Maximize $ℓ (θ) = \sum \log f (X_{i}; θ)$ . Properties: invariance ( $g (\hat{θ})$ estimates $g (θ)$ ), consistency, asymptotic normality $N (θ, I (θ)^{- 1} / n)$ under regularity.
Rao–Blackwellization: Improve any unbiased estimator by conditioning on a sufficient statistic; with completeness, obtain UMVU (Lehmann–Scheffé).