9  Introduction

9.1 Population and Sample

A population refers to the entire group of individuals or instances about whom we hope to learn. It encompasses all possible subjects or observations that meet a set of criteria. The population is the complete set of items that interest the researcher, and it can be finite (e.g. the students in a particular school) or infinite (e.g. the number of times a die can be rolled). A population size is given by the number of distinct elements and it includes every individual or observation of interest.

A sample is a subset of the population that is used to represent the population. Since studying an entire population is often impractical due to constraints like time, cost, and accessibility, samples provide a manageable and efficient way to gather data and make inferences about the population. It is important that the sample is representative of the population of interest to allow for valid inferences. It is always important to distinguish between a random sample, e.g. a random group of students in 5th year from a school to make inference about the students at the 5th year of such school, and a convenience sample, e.g. a class of 5th year students who are easily accessible to the researcher, but that can be not representative of all the 5th year students in the school.

Aspect Population Sample
Definition Entire group of interest Subset of the population
Size Large, potentially infinite Small, manageable
Data Collection Often impractical to study directly Practical and feasible
Purpose To understand the whole group To make inferences about the population
Figure 9.1: Population vs sample.

9.2 Estimators

Let’s consider a statistical model depending on some unknown parameter θ contained in the sample space of parameters Θ, i.e. {Pθ:θΘ}. Then, given an observed sample from the statistical model, Xn=(x1,,xn), an estimator is a function that maps the sample space to a set of sample estimates. Formally, since Xn is a collection of random variables, then also any estimator of θ, as function of the sample, is a random variable, i.e.  Xnθ(Xn)θ^(Xn), where we input a sample Xn to an estimator function θ(Xn) and we output an estimate θ^(Xn). When we condition to a particular value of the sample Xn, we obtain a point estimate of the true θ, i.e.  θ^(Xn=(x1,,xn))=θ^, that is a number (or a vector).

Since the estimator is a random variable itself, one can define some metrics to compare different estimators of the same parameter. Firstly, let’s consider the bias, the distance between the average of the collection of estimates and the single parameter being estimated, i.e. Bias{θ^(Xn)}=E{θ^(Xn)}θ. We distinguish between two kind of estimators:

  • Biased, when Bias{θ^(Xn)}0.
  • Unbiased, when Bias{θ^(Xn)}=0.

The variance is used to indicate how far the collection of estimates are from the expected value of the estimates, i.e. V{θ^(Xn)}=E{(θ^(Xn)E{θ^(Xn)})2}. Finally, we have the Mean Squared Error (MSE) of an estimator, i.e.  MSE{θ^(Xn)}=Bias{θ^(Xn)}2+V{θ^(Xn)}, where for an unbiased estimator, the mean squared error equals the variance.

9.2.1 Properties

As for some desirable property of an estimator one have

Consistency (weak): θ^(Xn)Pnθ

Efficiency: Among unbiased estimators, one with minimal variance is called (finite-sample) efficient; asymptotically efficient means it attains the Cramér–Rao lower bound (CRLB) limit.

Asymptotic normality: for some normalizing sequence an an(θ^(Xn)θ)dnN(0,σ2(θ)) for some normalizing sequence an.

9.2.2 Sufficiency and Completeness

Theorem 9.1 A statistic T=T(Xn) is sufficient for θ iff the joint likelihood factors as $ f(Xnθ)=g(T(Xn)θ)h(Xn)

Theorem 9.2 Rao–Blackwell. If T is sufficient and τ~ is any estimator, then τ^=E[τ~T] has Var(τ^)Var(τ~) and the same mean.

A statistic T is complete if Eθ[g(T)]=0 for all θ implies g(T)=0 almost surely.

:::{#thm-lehmann–scheffe .thm} Lehmann–Scheffé. If T is complete and sufficient and τ^=E[τ~T] is unbiased, then τ^ is the unique UMVU (uniformly minimum-variance unbiased) estimator of τ(θ). :::

9.2.3 Methods of construction

  • Method of Moments (MoM): Solve sample moments mk=1nXik for parameters matching population moments.

  • Maximum Likelihood (MLE): Maximize (θ)=logf(Xi;θ). Properties: invariance (g(θ^) estimates g(θ)), consistency, asymptotic normality N(θ,I(θ)1/n) under regularity.

  • Rao–Blackwellization: Improve any unbiased estimator by conditioning on a sufficient statistic; with completeness, obtain UMVU (Lehmann–Scheffé).