Section 1.1

Chapter 1 Statistical Models, Goals, And Performance Criteria

1.1 Data, Models, Parameters and Statistics

Definition of Statistical Model

Random experiment with sample space $\Omega$.
Random vector $X=(X_1, \ldots, X_n)$ defined on $\Omega$.
- $\omega \in \Omega$: outcome of experiment
- $X(\omega)$: data observations
Probability distribution of $X$
- $\mathcal{X}$: Sample Space = ${outcomes \,\,x}$
- $\mathcal{F}_X$: Sigma-field of measurable events.
- $P(\cdot)$ defined on $(\mathcal{X}, \mathcal{F}_X)$
Statistical Model
- $\mathcal{P} =$ {family of distribution}

Review of STAT609

Sigma-Field $\mathcal{F}$: A collection $\mathcal{F}$ of subsets of $\Omega$ is called a $\sigma$-field, which meets following conditions:
- $(\emptyset = \bar{\Omega}) \in \mathcal{F}$.
- If $A_1 \in \mathcal{F}, A_2\in\mathcal{F}, \ldots$, then $\cup_{i=1}^\infty A_i\in\mathcal{F}$. In other words, this means that $\mathcal{F}$ is closed under countable union.
- If $A\in\mathcal{F}$, then $\bar{A}\in\mathcal{F}$.

Definition of Parameters/ Parametrization

Parameter $\theta$ identifies/specifies distribution in $\mathcal{P}$.
$\mathcal{P} = {P_\theta, \theta\in\Theta}$
$\Theta={\theta}$, the Parameter Space
Ensure indentificable, that us $\theta_1\neq\theta_2 \Rightarrow P_{\theta_1} \neq P_{\theta_2}$. Identifiable Def.

Example: One-Sample Model
Given conditions:

$X_1, \ldots, X_n$ i.i.d. with distribution function $F(\cdot)$
Probability Model: $\mathcal{P}$ = {distribution function $F(\cdot)$}
Measurement Error Model: $X_i = \mu + \epsilon_i, \quad i=1,2,\ldots, n$
$\mu$ is constant parameter (e.g., real-valued, positive) $\epsilon_1, \epsilon_2, \ldots, \epsilon_n$ i.i.d. with distribution function $G(\cdot)$, notice that $G$ does not depend on $\mu$.

Therefore, we can imply that $X_1, \ldots, X_n$ i.i.d. with distribution function $F(x) = G(x-\mu)$. And we also have that $\mathcal{P} = {(\mu, G): \mu\in R, G\in\mathcal{G}}$. Depends on the class of $\mathcal{G}$, we can divide this one-sample model into following cases:

Parametric Model: Gaussian measurement errors ${\epsilon_j}$ are i.i.d. $N(0, \sigma^2)$, with $\sigma^2>0$, but the exact value of $\sigma$ is unknown.
Semi-Parametric Model: Symmetric measurement-error distributions with mean $\mu$.
${\epsilon_j}$ are i.i.d. with distribution function $G(\cdot)$, where $G\in\mathcal{G}$, the class of symmetric distributions with mean 0.
Non-Parametric Model: $X_1, \ldots, X_n$ are i.i.d. with distribution function $G(\cdot)$ where $G\in\mathcal{G}$, the class of all distributions on the sample space $\mathcal{X}$ (with center $\mu$).

Example: Two-Sample Model
Given conditions:

$X_1, \ldots, X_n$ i.i.d. with distribution function $F(\cdot)$
$Y_1, \ldots, Y_m$ i.i.d. with distribution function $G(\cdot)$
Probability Model: $\mathcal{P} = {(F,G), F\in\mathcal{F}, G\in\mathcal{G}}$. Specific cases relate $\mathcal{F}$ and $\mathcal{G}$.
Shift Model with parameter $\delta$
- ${X_i}$ i.i.d. $X\sim F(\cdot)$, response under Treatment A.
- ${Y_i}$ i.i.d. $Y\sim G(\cdot)$, response under Treatment B.
- $Y=X+\delta$, i.e., $G(v) = F(v-\delta)$
- $\delta$ is the difference in response with Treatment B instead of Treatment A, and $\delta$ does not depend on X(or say A).

Modeling Issues

Non-uniqueness of parametrization
Varing complexity of equivalent parametrizations
Possible non-identificability of parameters
Parameters “of interest” vs “Nuisance ”parameters
A vector parametrization that is unidentifiable may have identifiable components.
Data-based model selection How does using the data to select among models affect statistical inference?
Data-based sampling procedures How does the protocol for collecting data observations affect statistical inference?

Regular Models

Notations:
- $\theta$: A parameter specifying a probability distribution $P_\theta$.
- $F(\cdot \lvert\theta)$: Distribution function of $P(\theta)$
- $E_\theta[\cdot]$: Expectation under the assumption $X\sim P_\theta$. For a measurable function $g(X)$, $E_\theta[g(X)] = \int_\mathcal{X}g(x)dF(x\lvert\theta)$
- $p(x\lvert\theta) = p(x; \theta)$: densit or probability-mass function of $X$
Assumptions:
- Either All of the $P_\theta$ are continuous with densities $p(x\lvert\theta)$, Or All of the $P_\theta$ are discrete with pmf’s $p(x\lvert\theta)$
- The set ${x: p(x\lvert\theta) > 0}$ is the same for all $\theta \in \Theta$, that is the aforementioned set should be indepedent of $\theta$.

Regression Models
Given: $n$ cases $i=1,2,\ldots, n$

1 Response (dependent) variable $y_i, \,i=1,2,\ldots, n$
$p$ Explanatory (independent) variables $x_i = (x_{i,1}, \ldots, x_{i,p})^T, \,i=1,2,\ldots n$

Goal of Regression Analysis:

Extract/exploit relationship between $y_i$ and $x_i$.

Step for fitting a model:

Propose a model in terms of
- Response varibale $Y$
- Explanatory variables $X_1, \ldots, X_p$
- Assumptions about the distribution of $\epsilon$ over the cases.
Specify/define a criterion for judging different estimators.
Characterize the best estimator and apply it to the given data.
Check the assumptions in (1).
If necessary modify model and/or assumptions and go to (1).

Specifying Assumptions in (1) for Residual Distribution:

Gauss-Markov: zero mean, constant variance, uncorrelated Normal-linear models: $\epsilon_i$ are i.i.d. $N(0, σ^2)$ r.v.s
Generalized Gauss-Markov: zero mean, and general covariance matrix (possibly correlated,possibly heteroscedastic)
Non-normal/non-Gaussian distributions (e.g., Laplace, Pareto, Contaminated normal: some fraction $(1 − δ)$ of the $\epsilon_i$ are i.i.d. $N(0,σ^2)$ r.v.s the remaining fraction $(δ)$ follows some contamination distribution).

Time Series Models
Example: Measurement Model with Autoregressive Errors
Model:

$X_1, X_2, \ldots, X_n$ are $n$ successive measurements of a physical constant $\mu$
$X_i = \mu + e_i, \,i=1,2,\ldots,n$
$e_i = \beta_{e_{i-1}} + \epsilon_i, \,i=2,3,\ldots,n$ and $e_0=0$ where $\epsilon_i$ are i.i.d. with density $f(\cdot)$.

Note:

$e_i$ are not i.i.d but dependent.
$X_i$ are dependent as well
$\begin{aligned} &X_i = \mu(1-\beta) + \beta X_{i-1} + \epsilon_i, i=2,\ldots, n \\ &X_1 = \mu + \epsilon_1 \end{aligned}$

Remarks

How to prove a given parametrization identifiable?
How to identify whether a given model is regular?

PREVIOUSEnsemble Method

NEXTNaive Bayes and Probability Graphical Model Intro