Chapter 1 Statistical Models, Goals, And Performance Criteria
1.3 The Decision Theoretic Framework
Decision Theoretic Framework
- Basic elements of a decesion problem
- Estimation
Estimating a real parameter $\theta\in\Theta$ using data $X$ with conditional distribution $P_\theta$. -
Testing
\[\begin{aligned} &H_0: P_\theta \in P_0 \\ &H_1: P_\theta \notin P_0 \end{aligned}\]
Given data $X\sim P_\theta$, choosing between two hypotheses (deciding whether to accept or reject $H_0$) - Ranking
rank a collection of items from best to worst
Examples:- Products evaluated by consumer interest group
- Sports betting (horse race, team tournament)
- Prediction
Predict response variable $Y$ given explanatory variables $Z=(Z_1, Z_2, \ldots, Z_d)$
- If know joint distribution of $(Z,Y)$, use $\mu(Z) = E[Y\lvert Z]$
- With data ${(z_i,y_i), i=1,2,\ldots, n}$, estimate $\mu(Z)$. If $\mu(Z) = g(\beta, Z)$, then use $\hat{\mu}(Z) = g(\hat{\beta}, Z)$.
- $\Theta = {\theta}$: The “State Space”
$\theta =$ state of nature (unknown uncertainty element in the problem) - $\mathcal{A} = {a}$: The “Action Space”
$a=$ action taken by statistician - $\mathcal{L}(\theta, a)$: The “Loss Function”
- $\mathcal{L}(\theta, a)=$ loss incurred when state is $\theta$ and action a taken.
- $\mathcal{L}: \Theta \times\mathcal{A}\rightarrow\mathcal{R}$
- Estimation
- Additional Elements of a Statistical Decision Problem
- $X \sim P_\theta$: Random Variable (Statistical Observation)
- Conditional distribution of $X$ given $\theta$.
- Sample space $\mathcal{X} = {x}$
-
Density/pmf function of conditional distribution:
\[f(x\lvert\theta) \quad or\quad f_X(x\lvert\theta)\]
- $\delta(X)$: A “Decision Procedure”
- Observe data $X = x$ and take action $a\in\mathcal{A}$
- $\delta(\cdot):\mathcal{X}\rightarrow \mathcal{A}$
- $D$: Decision Space (class of decisino procedures)
- $D =$ {decision procedures $\delta: \mathcal{X}\rightarrow\mathcal{A}$}
- $R(\theta, \delta)$: Risk Function (performance measure of $\delta(\cdot)\lvert\theta)$
- $R(\theta, \delta) = E_X[\mathcal{L}(\theta, \delta(X))\lvert\theta]$
- Expectation of loss incurred by decision procedure $\delta(X)$ when $\theta$ is true.
- For no-data problem (no X), $R(\theta, a) = \mathcal{L}(\theta, a)$
- $X \sim P_\theta$: Random Variable (Statistical Observation)
- Additional Elements of a Bayesian Decision Problem
- $\theta\sim\pi$: Prior Distribution for parameter $\theta\in\Theta$
- $r(\pi, \delta)$: Bayes Risk of $\delta$ given prior distribution $\pi$
- $r(\pi, \delta) = E_{\theta }R(\theta^, \delta(X))$, taking expectation with respect to $\theta^*\sim\pi$
- *Bayes rule $\delta^$*: Decision procedure that minimizes the Bayes risk $r(\pi, \delta^) = \min_{\delta\in D} r(\pi, \delta)$
Example of Statistical Decision Problems
- Statistical Estimation Problem
- Given:
- $X\sim P_\theta = N(\theta,1), \quad -\infty<\theta<\infty$
- $\mathcal{A} = \Theta = \mathcal{R}$
-
Squared-error Loss:
\[\mathcal{L}(\theta, a) = (a-\theta)^2\] -
Decision procedure: for finite constant $c: 0<c\leq 1$
\[\delta_c(X) = cX\] -
Risk function:
\[\begin{aligned} R(\theta, \delta_c) &= E_X[(\delta(X) - \theta)^2 \lvert \theta] \\ &= Var(\delta(x)) + [E_X[\delta(x)\lvert \theta] - \theta]^2 \\ &= c^2 +(c-1)^2\theta^2 \end{aligned}\] - Special cases:
- $\delta_1(X) = X:\quad R(\theta, \delta_1) =1$ (independent of $\theta$)
- $\delta_0(X) \equiv X:\quad R(\theta, \delta_0) = \theta^2$ (zero at $\theta = 0$, unbounded)
- $\theta_{0.5}(X) = \frac{X}{2}: \quad R(\theta, \delta_{0.5}) = \frac{1}{4} \times (1+\theta^2)$
- Mean-Squared Error: Estimation Risk (Squared-Error Loss)
- $X\sim P_\theta, \theta\in\Theta$
- Parameter of interest: $v(\theta)$ (some function of $\theta$)
- Action Space: $\mathcal{A} = { v=v(\theta), \theta\in\Theta }$
- Decision procedure/estimator: $\hat{v}(X):\mathcal{X}\rightarrow\mathcal{A}$
- Squared Error Loss: $\mathcal{L}(\theta, a) = [a - v(\theta)]^2$
-
Risk equal to Mean-Squared Error:
\[\begin{aligned} R(\theta, \hat{v}(X)) &= E[L(\theta, \hat{v}(X)) \lvert \theta] \\ &= E[(\hat{v}(X) - v(\theta))^2\lvert \theta] = MSE(\hat{v}) \end{aligned}\]
-
Proposition 1.3.1 For an estimator $\hat{v}(X)$ of $v(\theta)$, the mean-squared error is
\[MSE(\hat{v}) = Var[\hat{v}(X)\lvert \theta] + [Bias(\hat{v} \lvert \theta)]^2\]where $Bias(\hat{v}\lvert\theta) = E[\hat{v}(X) \theta] - v(\theta)$
Definition: $\hat{v}$ is Unbiased if $Bias(\hat{v} \lvert \theta) = 0$ for all $\theta\in\Theta$
- Given:
- Statistical Testing Problem (Two-Sample Problem)
- Given:
- $X_1, \ldots, X_m$ i.i.d. $\mathcal{N}(\mu, \sigma^2)$, response under control treatment
- $Y_1, \ldots, Y_N$ i.i.d. $\mathcal{N}(\mu + \Delta, \sigma^2)$, response under test treatment
- $\mu \in R, \sigma^2\in R_+$ unknown
- $\Delta\in R$, is unknown treatment effect
- Let $P(X,Y\lvert\mu, \Delta, \sigma^2)$ denote the joint distribution of $X = (X_1, \ldots, X_m)$ and $Y=(Y_1, \ldots, Y_n)$
- Define two hypotheses:
- $H_0: P\in{ P:\Delta=0 } = {P_\theta, \theta \in \Theta_0 }$
- $H_1: P\in {P:\Delta\neq0} = { P_\theta, \theta \notin \Theta_0 }$
- $\mathcal{A} = {0,1}$ with 0 corresponding to accepting $H_0$ and 1t to rejecting $H_0$.
-
Construct decision rule accepting $H_0$ if estimate of $\Delta$ is significantly different from zero, e.g.,
\[\delta(X,Y) = \begin{cases} 0\quad if \quad\lvert\frac{\hat{\Delta}}{\hat{\sigma}}\lvert <c \quad (criticle\,\, value) \\ 1\quad if \quad\lvert\frac{\hat{\Delta}}{\hat{\sigma}}\lvert \geq c \end{cases}\]
$\hat{\Delta} = \bar{Y} - \bar{X}$ (difference in sample means)
$\hat{\sigma}$: an estimate of $\sigma$ -
Zero-one Loss function
\[L(\theta, a) = \begin{cases} 0\quad if \quad \theta\in\Theta_a\quad (correct\,\, action) \\ 1\quad if \quad \theta\notin\Theta_a\quad (wrong\,\, action) \end{cases}\] -
Risk function: Take as the measure of performance of decision rule $\delta(X)$.
\[\begin{aligned} R(\theta, \delta) &= E_P[l(P, \delta(X))] \\ &= L(\theta, 0)P_\theta(\delta(X,Y) = 0) + L(\theta, 1)P_\theta(\delta(X,Y) = 1) \\ &= P_\theta(\delta(X,Y) = 1), \quad if\quad \theta\in\Theta_0 \\ &= P_\theta(\delta(X,Y) = 0), \quad if\quad \theta\notin\Theta_0 \end{aligned}\] - Terminology of Statistical Testing
-
Critical Region of a test $\delta(\cdot)$
\[C = \{x: \delta(x) = 1\}\] - Type I Error
$\delta(X)$ rejects $H_0$ when $H_0$ is true. - Type II Error
$\delta(X)$ rejects $H_0$ when $H_0$ is false. - Neyman-Pearson framework
Constrained optimization of risks:
Minimize: P(Type II Error) Subject to: P(Type I Error) $\leq \alpha$ (“significance level”)
-
- Given:
- Interval Estimation and Confidence Bounds
- Value-at-Risk (VAR)
- Let $X_1, X_2, \ldots$ be the change in value of an asset over independent fixed holding periods and suppose they are i.i.d. $X \sim P_\theta$ for some fixed $\theta\in\Theta$.
- For $\alpha = 0.05$, say, define $VAR_{\alpha}$ (the level-$\alpha$ Value-at-Risk) by $P(X \leq -VAR_{\alpha} \lvert) = \alpha$
-
Consider estimating the VAR of $X_{n+1}$ given $X=(X_1, \ldots, X_n)$
\[P_{\theta}(X \leq -\hat{VAR}(X)) \leq \alpha\]
Determine an estimator $\hat{VAR}(X)$:for all $\theta\in\Theta$.
- The outcome $X_{n+1}$ exceeds $VAR_{\alpha}$ to the downside with probability no greater than $\alpha (= 0.05)$
- Lower-Bound Estimation
- $X\sim P_\theta, \theta \in \Theta$
- Parameter of interest: $v(\theta)$
- Action Space: $\mathcal{A} = { v=v(\theta), \theta\in\Theta }$
- Estimator: $\hat{v}(X): \mathcal{X}\rightarrow\mathcal{A}$
- Objective: bounding $v(\theta)$ from below
- Lower-Bound Estimator: $\hat{v}(X)$ is good if
- $P_\theta(\hat{v}(X) \leq v(\theta))$ has high probability
- $P_\theta(\hat{v}(X) > v(\theta))$ has low probability $\Rightarrow$ Define the loss function
- $L(\theta, a) = 1$, if $a>v(\theta)$; zero otherwise.
-
Risk function under zero-one loss $L(\theta, a)$:
\[R(\theta, \hat{v}(X)) = E[L(\theta, \hat{v}(X))\lvert \theta] = P_\theta(\hat{v}(X) > v(\theta))\] -
The Lower-Bound Estimator $\hat{v}(X)$ has Confidence Level $(1-\alpha)$ if
\[P_\theta(\hat{v}(X) \leq v(\theta)) \geq 1-\alpha,\]for all $\theta \in \Theta$.
- Interval (Lower and Upper Bound) Estimation
- $X \sim P_\theta, \theta\in \Theta$
- Parameter of interest: $v(\theta)$
- Define $\mathcal{V} = {v = v(\theta), \theta\in\Theta}$
- Objective: Interval estimation of $v(\theta)$
- Action Space: $\mathcal{A} = {\mathbb{a} = [a_{lower}, a_{upper}]: a\in\mathcal{V}}$
- Estimator: \(\begin{aligned} &\hat{v}(X): \mathcal{X}\rightarrow\mathcal{A} \\ &\hat{v}(X) = [\hat{v}_{LOWER}(X), \hat{v}_{UPPER}(X)] \end{aligned}\)
- Interval Estimator: $\hat{v}(X)$ is good if
- $P_\theta(\hat{v}{LOWER}(X)\leq v(\theta) \leq \hat{v}{UPPER}(X))$ is high
-
$P_\theta(\hat{v}_{LOWER}(X) > v(\theta) v(\theta) \leq \hat{v}_{UPPER}(X))$ is low
Noted that if $\theta$ is non-random; we will need Bayesian models to finish the calculation.
- Define the loss function \(\begin{aligned} L(\theta, (a_{lower}, a_{upper})) &= 1, \,if\, a_{lower}>v(\theta) \,or\, \bar{a} < v(\theta) \\ &= 0, \,otherwise \end{aligned}\)
- Risk function under zero-one loss $L(\theta, a)$: \(\begin{aligned} R(\theta, \hat{v}(X)) &= E[L(\theta, \hat{v}(X)) \lvert \theta] \\ &= P_\theta(\hat{v}_{Lower}(X) > v(\theta) \,or\, \hat{v}_{Upper}(X) < v(\theta)) \\ &= 1 - P_{\theta}(\hat{v}_{Lower}(X) \leq v(\theta) \leq \hat{v}_{Upper}(X) \lvert \theta) \end{aligned}\)
- The Interval Estimator $\hat{v}(X)$ has Confidence Level $(1-\alpha)$ if \(P_\theta(\hat{v}_{Lower}(X) \leq v(\theta) \leq \hat{v}_{Upper}(X) \lvert \theta) \geq (1-\alpha)\,for\,all\,\theta\in\Theta\) Equivalently: \(R(\theta, \hat{v}(X)) \leq \alpha, \,for\,all\,\theta\in\Theta.\)
- Choosing Among Decision Procedures
- Admissible/Inadmissible A decision procedure $\delta(\cdot)$ is inadmissible if $\exists\delta’$ such that $R(\theta, \delta’) \leq R(\theta, \delta)$ for all $\theta \in\Theta$ with strict inequality for some $\theta$.
- Objectives:
- Restrict $\mathcal{D}$ to exclude inadmissible decision procedures.
- Characterize “Complete Class” (all admissible procedures).
- Formalize ‘best’ choice amongst all admissible procedures.
- Approaches to Decision Selection
- Two risk functions based on global criteria
- Bayes Risk
- Elements
- Basic Elements of Decision Problem
- $X\sim P_\theta$: R.V.
- $\delta(X)$: Decision Procedure
- $\mathcal{D}$: Decision Space
- $R(\theta, \delta)$: Risk Function
- Additional Elements of Bayesian Decision Problem
- $\theta \sim \pi$: Prior Distribution for parameter $\theta\in\Theta$
- $r(\pi, \delta)$: Bayes Risk of $\theta$ given prior distribution $\pi$
- Bayes rule $\delta^*$: Decision procedure that minimizes the Bayes risk
- Basic Elements of Decision Problem
- Computation
- Discrete priors \(r(\pi, \delta) = \sum_\theta \pi(\theta)R(\theta, \delta)\)
- Continuous priors \(r(\pi, \delta) = \int_\Theta \pi(\theta)R(\theta, \delta)d\theta\)
- Identifying Bayes Procedures
- Posterior analysis specifies Bayes rules directly
- Apply Posterior Distribution of $\theta$ given $X$ to minimize risk a posterior.
- Elements
- Maximum Risk (minimax approach)
- Minimax Criterion
- Prefer $\delta$ to $\delta’$ if \(\sup_{\theta\in\Theta}R(\theta, \delta) < \sup_{\theta\in\Theta} R(\theta, \delta')\)
- A procedure $\delta^*$ is called minimax if \(\sup_{\theta\in\Theta}R(\theta, \delta^*) = \inf_{\delta\in\mathcal{D}}\sup_{\theta\in\Theta}R(\theta, \delta^*)\)
- Minimax Criterion
- Bayes Risk
- Two risk functions based on global criteria
- Value-at-Risk (VAR)