Chapter 1 Statistical Models, Goals, And Performance Criteria
1.1 Data, Models, Parameters and Statistics
Definition of Statistical Model
- Random experiment with sample space $\Omega$.
- Random vector $X=(X_1, \ldots, X_n)$ defined on $\Omega$.
- $\omega \in \Omega$: outcome of experiment
- $X(\omega)$: data observations
- Probability distribution of $X$
- $\mathcal{X}$: Sample Space = ${outcomes \,\,x}$
- $\mathcal{F}_X$: Sigma-field of measurable events.
- $P(\cdot)$ defined on $(\mathcal{X}, \mathcal{F}_X)$
- Statistical Model
- $\mathcal{P} =$ {family of distribution}
Review of STAT609
- Sigma-Field $\mathcal{F}$: A collection $\mathcal{F}$ of subsets of $\Omega$ is called a $\sigma$-field, which meets following conditions:
- $(\emptyset = \bar{\Omega}) \in \mathcal{F}$.
- If $A_1 \in \mathcal{F}, A_2\in\mathcal{F}, \ldots$, then $\cup_{i=1}^\infty A_i\in\mathcal{F}$. In other words, this means that $\mathcal{F}$ is closed under countable union.
- If $A\in\mathcal{F}$, then $\bar{A}\in\mathcal{F}$.
Definition of Parameters/ Parametrization
- Parameter $\theta$ identifies/specifies distribution in $\mathcal{P}$.
- $\mathcal{P} = {P_\theta, \theta\in\Theta}$
- $\Theta={\theta}$, the Parameter Space
- Ensure indentificable, that us $\theta_1\neq\theta_2 \Rightarrow P_{\theta_1} \neq P_{\theta_2}$. Identifiable Def.
Example: One-Sample Model
Given conditions:
- $X_1, \ldots, X_n$ i.i.d. with distribution function $F(\cdot)$
- Probability Model: $\mathcal{P}$ = {distribution function $F(\cdot)$}
- Measurement Error Model:
$X_i = \mu + \epsilon_i, \quad i=1,2,\ldots, n$
$\mu$ is constant parameter (e.g., real-valued, positive) $\epsilon_1, \epsilon_2, \ldots, \epsilon_n$ i.i.d. with distribution function $G(\cdot)$, notice that $G$ does not depend on $\mu$.
Therefore, we can imply that $X_1, \ldots, X_n$ i.i.d. with distribution function $F(x) = G(x-\mu)$. And we also have that $\mathcal{P} = {(\mu, G): \mu\in R, G\in\mathcal{G}}$. Depends on the class of $\mathcal{G}$, we can divide this one-sample model into following cases:
- Parametric Model: Gaussian measurement errors ${\epsilon_j}$ are i.i.d. $N(0, \sigma^2)$, with $\sigma^2>0$, but the exact value of $\sigma$ is unknown.
- Semi-Parametric Model: Symmetric measurement-error distributions with mean $\mu$.
${\epsilon_j}$ are i.i.d. with distribution function $G(\cdot)$, where $G\in\mathcal{G}$, the class of symmetric distributions with mean 0. - Non-Parametric Model: $X_1, \ldots, X_n$ are i.i.d. with distribution function $G(\cdot)$ where $G\in\mathcal{G}$, the class of all distributions on the sample space $\mathcal{X}$ (with center $\mu$).
Example: Two-Sample Model
Given conditions:
- $X_1, \ldots, X_n$ i.i.d. with distribution function $F(\cdot)$
- $Y_1, \ldots, Y_m$ i.i.d. with distribution function $G(\cdot)$
- Probability Model: $\mathcal{P} = {(F,G), F\in\mathcal{F}, G\in\mathcal{G}}$. Specific cases relate $\mathcal{F}$ and $\mathcal{G}$.
- Shift Model with parameter $\delta$
- ${X_i}$ i.i.d. $X\sim F(\cdot)$, response under Treatment A.
- ${Y_i}$ i.i.d. $Y\sim G(\cdot)$, response under Treatment B.
- $Y=X+\delta$, i.e., $G(v) = F(v-\delta)$
- $\delta$ is the difference in response with Treatment B instead of Treatment A, and $\delta$ does not depend on X(or say A).
Modeling Issues
- Non-uniqueness of parametrization
- Varing complexity of equivalent parametrizations
- Possible non-identificability of parameters
- Parameters “of interest” vs “Nuisance ”parameters
- A vector parametrization that is unidentifiable may have identifiable components.
- Data-based model selection How does using the data to select among models affect statistical inference?
- Data-based sampling procedures How does the protocol for collecting data observations affect statistical inference?
Regular Models
- Notations:
- $\theta$: A parameter specifying a probability distribution $P_\theta$.
- $F(\cdot \lvert\theta)$: Distribution function of $P(\theta)$
- $E_\theta[\cdot]$: Expectation under the assumption $X\sim P_\theta$. For a measurable function $g(X)$, \(E_\theta[g(X)] = \int_\mathcal{X}g(x)dF(x\lvert\theta)\)
- $p(x\lvert\theta) = p(x; \theta)$: densit or probability-mass function of $X$
- Assumptions:
- Either All of the $P_\theta$ are continuous with densities $p(x\lvert\theta)$, Or All of the $P_\theta$ are discrete with pmf’s $p(x\lvert\theta)$
- The set ${x: p(x\lvert\theta) > 0}$ is the same for all $\theta \in \Theta$, that is the aforementioned set should be indepedent of $\theta$.
Regression Models
Given:
$n$ cases $i=1,2,\ldots, n$
- 1 Response (dependent) variable \(y_i, \,i=1,2,\ldots, n\)
- $p$ Explanatory (independent) variables \(x_i = (x_{i,1}, \ldots, x_{i,p})^T, \,i=1,2,\ldots n\)
Goal of Regression Analysis:
- Extract/exploit relationship between $y_i$ and $x_i$.
Step for fitting a model:
- Propose a model in terms of
- Response varibale $Y$
- Explanatory variables $X_1, \ldots, X_p$
- Assumptions about the distribution of $\epsilon$ over the cases.
- Specify/define a criterion for judging different estimators.
- Characterize the best estimator and apply it to the given data.
- Check the assumptions in (1).
- If necessary modify model and/or assumptions and go to (1).
Specifying Assumptions in (1) for Residual Distribution:
- Gauss-Markov: zero mean, constant variance, uncorrelated Normal-linear models: $\epsilon_i$ are i.i.d. $N(0, σ^2)$ r.v.s
- Generalized Gauss-Markov: zero mean, and general covariance matrix (possibly correlated,possibly heteroscedastic)
- Non-normal/non-Gaussian distributions (e.g., Laplace, Pareto, Contaminated normal: some fraction $(1 − δ)$ of the $\epsilon_i$ are i.i.d. $N(0,σ^2)$ r.v.s the remaining fraction $(δ)$ follows some contamination distribution).
Time Series Models
Example: Measurement Model with Autoregressive Errors
Model:
- $X_1, X_2, \ldots, X_n$ are $n$ successive measurements of a physical constant $\mu$
- $X_i = \mu + e_i, \,i=1,2,\ldots,n$
- $e_i = \beta_{e_{i-1}} + \epsilon_i, \,i=2,3,\ldots,n$ and $e_0=0$ where $\epsilon_i$ are i.i.d. with density $f(\cdot)$.
Note:
- $e_i$ are not i.i.d but dependent.
- $X_i$ are dependent as well
\(\begin{aligned} &X_i = \mu(1-\beta) + \beta X_{i-1} + \epsilon_i, i=2,\ldots, n \\ &X_1 = \mu + \epsilon_1 \end{aligned}\)
Remarks
- How to prove a given parametrization identifiable?
- How to identify whether a given model is regular?