Section 1.3

 

Chapter 1 Statistical Models, Goals, And Performance Criteria

1.3 The Decision Theoretic Framework

Decision Theoretic Framework
  • Basic elements of a decesion problem
    • Estimation
      Estimating a real parameter $\theta\in\Theta$ using data $X$ with conditional distribution $P_\theta$.
    • Testing
      Given data $X\sim P_\theta$, choosing between two hypotheses (deciding whether to accept or reject $H_0$)

      \[\begin{aligned} &H_0: P_\theta \in P_0 \\ &H_1: P_\theta \notin P_0 \end{aligned}\]
    • Ranking
      rank a collection of items from best to worst
      Examples:
      • Products evaluated by consumer interest group
      • Sports betting (horse race, team tournament)
    • Prediction Predict response variable $Y$ given explanatory variables $Z=(Z_1, Z_2, \ldots, Z_d)$
      • If know joint distribution of $(Z,Y)$, use $\mu(Z) = E[Y\lvert Z]$
      • With data ${(z_i,y_i), i=1,2,\ldots, n}$, estimate $\mu(Z)$. If $\mu(Z) = g(\beta, Z)$, then use $\hat{\mu}(Z) = g(\hat{\beta}, Z)$.
    • $\Theta = {\theta}$: The “State Space
      $\theta =$ state of nature (unknown uncertainty element in the problem)
    • $\mathcal{A} = {a}$: The “Action Space
      $a=$ action taken by statistician
    • $\mathcal{L}(\theta, a)$: The “Loss Function
      • $\mathcal{L}(\theta, a)=$ loss incurred when state is $\theta$ and action a taken.
      • $\mathcal{L}: \Theta \times\mathcal{A}\rightarrow\mathcal{R}$
  • Additional Elements of a Statistical Decision Problem
    • $X \sim P_\theta$: Random Variable (Statistical Observation)
      • Conditional distribution of $X$ given $\theta$.
      • Sample space $\mathcal{X} = {x}$
      • Density/pmf function of conditional distribution:

        \[f(x\lvert\theta) \quad or\quad f_X(x\lvert\theta)\]
    • $\delta(X)$: A “Decision Procedure”
      • Observe data $X = x$ and take action $a\in\mathcal{A}$
      • $\delta(\cdot):\mathcal{X}\rightarrow \mathcal{A}$
    • $D$: Decision Space (class of decisino procedures)
      • $D =$ {decision procedures $\delta: \mathcal{X}\rightarrow\mathcal{A}$}
    • $R(\theta, \delta)$: Risk Function (performance measure of $\delta(\cdot)\lvert\theta)$
      • $R(\theta, \delta) = E_X[\mathcal{L}(\theta, \delta(X))\lvert\theta]$
      • Expectation of loss incurred by decision procedure $\delta(X)$ when $\theta$ is true.
      • For no-data problem (no X), $R(\theta, a) = \mathcal{L}(\theta, a)$
  • Additional Elements of a Bayesian Decision Problem
    • $\theta\sim\pi$: Prior Distribution for parameter $\theta\in\Theta$
    • $r(\pi, \delta)$: Bayes Risk of $\delta$ given prior distribution $\pi$
      • $r(\pi, \delta) = E_{\theta }R(\theta^, \delta(X))$, taking expectation with respect to $\theta^*\sim\pi$
    • *Bayes rule $\delta^$*: Decision procedure that minimizes the Bayes risk $r(\pi, \delta^) = \min_{\delta\in D} r(\pi, \delta)$
Example of Statistical Decision Problems
  • Statistical Estimation Problem
    • Given:
      • $X\sim P_\theta = N(\theta,1), \quad -\infty<\theta<\infty$
      • $\mathcal{A} = \Theta = \mathcal{R}$
      • Squared-error Loss:

        \[\mathcal{L}(\theta, a) = (a-\theta)^2\]
      • Decision procedure: for finite constant $c: 0<c\leq 1$

        \[\delta_c(X) = cX\]
      • Risk function:

        \[\begin{aligned} R(\theta, \delta_c) &= E_X[(\delta(X) - \theta)^2 \lvert \theta] \\ &= Var(\delta(x)) + [E_X[\delta(x)\lvert \theta] - \theta]^2 \\ &= c^2 +(c-1)^2\theta^2 \end{aligned}\]
      • Special cases:
        • $\delta_1(X) = X:\quad R(\theta, \delta_1) =1$ (independent of $\theta$)
        • $\delta_0(X) \equiv X:\quad R(\theta, \delta_0) = \theta^2$ (zero at $\theta = 0$, unbounded)
        • $\theta_{0.5}(X) = \frac{X}{2}: \quad R(\theta, \delta_{0.5}) = \frac{1}{4} \times (1+\theta^2)$
      • Mean-Squared Error: Estimation Risk (Squared-Error Loss)
        • $X\sim P_\theta, \theta\in\Theta$
        • Parameter of interest: $v(\theta)$ (some function of $\theta$)
        • Action Space: $\mathcal{A} = { v=v(\theta), \theta\in\Theta }$
        • Decision procedure/estimator: $\hat{v}(X):\mathcal{X}\rightarrow\mathcal{A}$
        • Squared Error Loss: $\mathcal{L}(\theta, a) = [a - v(\theta)]^2$
        • Risk equal to Mean-Squared Error:

          \[\begin{aligned} R(\theta, \hat{v}(X)) &= E[L(\theta, \hat{v}(X)) \lvert \theta] \\ &= E[(\hat{v}(X) - v(\theta))^2\lvert \theta] = MSE(\hat{v}) \end{aligned}\]
      • Proposition 1.3.1 For an estimator $\hat{v}(X)$ of $v(\theta)$, the mean-squared error is

        \[MSE(\hat{v}) = Var[\hat{v}(X)\lvert \theta] + [Bias(\hat{v} \lvert \theta)]^2\]

        where $Bias(\hat{v}\lvert\theta) = E[\hat{v}(X) \theta] - v(\theta)$

        Definition: $\hat{v}$ is Unbiased if $Bias(\hat{v} \lvert \theta) = 0$ for all $\theta\in\Theta$

  • Statistical Testing Problem (Two-Sample Problem)
    • Given:
      • $X_1, \ldots, X_m$ i.i.d. $\mathcal{N}(\mu, \sigma^2)$, response under control treatment
      • $Y_1, \ldots, Y_N$ i.i.d. $\mathcal{N}(\mu + \Delta, \sigma^2)$, response under test treatment
        • $\mu \in R, \sigma^2\in R_+$ unknown
        • $\Delta\in R$, is unknown treatment effect
    • Let $P(X,Y\lvert\mu, \Delta, \sigma^2)$ denote the joint distribution of $X = (X_1, \ldots, X_m)$ and $Y=(Y_1, \ldots, Y_n)$
    • Define two hypotheses:
      • $H_0: P\in{ P:\Delta=0 } = {P_\theta, \theta \in \Theta_0 }$
      • $H_1: P\in {P:\Delta\neq0} = { P_\theta, \theta \notin \Theta_0 }$
    • $\mathcal{A} = {0,1}$ with 0 corresponding to accepting $H_0$ and 1t to rejecting $H_0$.
    • Construct decision rule accepting $H_0$ if estimate of $\Delta$ is significantly different from zero, e.g.,
      $\hat{\Delta} = \bar{Y} - \bar{X}$ (difference in sample means)
      $\hat{\sigma}$: an estimate of $\sigma$

      \[\delta(X,Y) = \begin{cases} 0\quad if \quad\lvert\frac{\hat{\Delta}}{\hat{\sigma}}\lvert <c \quad (criticle\,\, value) \\ 1\quad if \quad\lvert\frac{\hat{\Delta}}{\hat{\sigma}}\lvert \geq c \end{cases}\]
    • Zero-one Loss function

      \[L(\theta, a) = \begin{cases} 0\quad if \quad \theta\in\Theta_a\quad (correct\,\, action) \\ 1\quad if \quad \theta\notin\Theta_a\quad (wrong\,\, action) \end{cases}\]
    • Risk function: Take as the measure of performance of decision rule $\delta(X)$.

      \[\begin{aligned} R(\theta, \delta) &= E_P[l(P, \delta(X))] \\ &= L(\theta, 0)P_\theta(\delta(X,Y) = 0) + L(\theta, 1)P_\theta(\delta(X,Y) = 1) \\ &= P_\theta(\delta(X,Y) = 1), \quad if\quad \theta\in\Theta_0 \\ &= P_\theta(\delta(X,Y) = 0), \quad if\quad \theta\notin\Theta_0 \end{aligned}\]
    • Terminology of Statistical Testing
      • Critical Region of a test $\delta(\cdot)$

        \[C = \{x: \delta(x) = 1\}\]
      • Type I Error
        $\delta(X)$ rejects $H_0$ when $H_0$ is true.
      • Type II Error
        $\delta(X)$ rejects $H_0$ when $H_0$ is false.
      • Neyman-Pearson framework
        Constrained optimization of risks:
        Minimize: P(Type II Error) Subject to: P(Type I Error) $\leq \alpha$ (“significance level”)
  • Interval Estimation and Confidence Bounds
    • Value-at-Risk (VAR)
      • Let $X_1, X_2, \ldots$ be the change in value of an asset over independent fixed holding periods and suppose they are i.i.d. $X \sim P_\theta$ for some fixed $\theta\in\Theta$.
      • For $\alpha = 0.05$, say, define $VAR_{\alpha}$ (the level-$\alpha$ Value-at-Risk) by $P(X \leq -VAR_{\alpha} \lvert) = \alpha$
      • Consider estimating the VAR of $X_{n+1}$ given $X=(X_1, \ldots, X_n)$
        Determine an estimator $\hat{VAR}(X)$:

        \[P_{\theta}(X \leq -\hat{VAR}(X)) \leq \alpha\]

        for all $\theta\in\Theta$.

      • The outcome $X_{n+1}$ exceeds $VAR_{\alpha}$ to the downside with probability no greater than $\alpha (= 0.05)$
    • Lower-Bound Estimation
      • $X\sim P_\theta, \theta \in \Theta$
      • Parameter of interest: $v(\theta)$
      • Action Space: $\mathcal{A} = { v=v(\theta), \theta\in\Theta }$
      • Estimator: $\hat{v}(X): \mathcal{X}\rightarrow\mathcal{A}$
      • Objective: bounding $v(\theta)$ from below
      • Lower-Bound Estimator: $\hat{v}(X)$ is good if
        • $P_\theta(\hat{v}(X) \leq v(\theta))$ has high probability
        • $P_\theta(\hat{v}(X) > v(\theta))$ has low probability $\Rightarrow$ Define the loss function
        • $L(\theta, a) = 1$, if $a>v(\theta)$; zero otherwise.
      • Risk function under zero-one loss $L(\theta, a)$:

        \[R(\theta, \hat{v}(X)) = E[L(\theta, \hat{v}(X))\lvert \theta] = P_\theta(\hat{v}(X) > v(\theta))\]
      • The Lower-Bound Estimator $\hat{v}(X)$ has Confidence Level $(1-\alpha)$ if

        \[P_\theta(\hat{v}(X) \leq v(\theta)) \geq 1-\alpha,\]

        for all $\theta \in \Theta$.

    • Interval (Lower and Upper Bound) Estimation
      • $X \sim P_\theta, \theta\in \Theta$
      • Parameter of interest: $v(\theta)$
      • Define $\mathcal{V} = {v = v(\theta), \theta\in\Theta}$
      • Objective: Interval estimation of $v(\theta)$
      • Action Space: $\mathcal{A} = {\mathbb{a} = [a_{lower}, a_{upper}]: a\in\mathcal{V}}$
      • Estimator: \(\begin{aligned} &\hat{v}(X): \mathcal{X}\rightarrow\mathcal{A} \\ &\hat{v}(X) = [\hat{v}_{LOWER}(X), \hat{v}_{UPPER}(X)] \end{aligned}\)
      • Interval Estimator: $\hat{v}(X)$ is good if
        • $P_\theta(\hat{v}{LOWER}(X)\leq v(\theta) \leq \hat{v}{UPPER}(X))$ is high
        • $P_\theta(\hat{v}_{LOWER}(X) > v(\theta)   v(\theta) \leq \hat{v}_{UPPER}(X))$ is low

        Noted that if $\theta$ is non-random; we will need Bayesian models to finish the calculation.

      • Define the loss function \(\begin{aligned} L(\theta, (a_{lower}, a_{upper})) &= 1, \,if\, a_{lower}>v(\theta) \,or\, \bar{a} < v(\theta) \\ &= 0, \,otherwise \end{aligned}\)
      • Risk function under zero-one loss $L(\theta, a)$: \(\begin{aligned} R(\theta, \hat{v}(X)) &= E[L(\theta, \hat{v}(X)) \lvert \theta] \\ &= P_\theta(\hat{v}_{Lower}(X) > v(\theta) \,or\, \hat{v}_{Upper}(X) < v(\theta)) \\ &= 1 - P_{\theta}(\hat{v}_{Lower}(X) \leq v(\theta) \leq \hat{v}_{Upper}(X) \lvert \theta) \end{aligned}\)
      • The Interval Estimator $\hat{v}(X)$ has Confidence Level $(1-\alpha)$ if \(P_\theta(\hat{v}_{Lower}(X) \leq v(\theta) \leq \hat{v}_{Upper}(X) \lvert \theta) \geq (1-\alpha)\,for\,all\,\theta\in\Theta\) Equivalently: \(R(\theta, \hat{v}(X)) \leq \alpha, \,for\,all\,\theta\in\Theta.\)
    • Choosing Among Decision Procedures
      • Admissible/Inadmissible A decision procedure $\delta(\cdot)$ is inadmissible if $\exists\delta’$ such that $R(\theta, \delta’) \leq R(\theta, \delta)$ for all $\theta \in\Theta$ with strict inequality for some $\theta$.
      • Objectives:
        • Restrict $\mathcal{D}$ to exclude inadmissible decision procedures.
        • Characterize “Complete Class” (all admissible procedures).
        • Formalize ‘best’ choice amongst all admissible procedures.
    • Approaches to Decision Selection
      • Two risk functions based on global criteria
        • Bayes Risk
          • Elements
            • Basic Elements of Decision Problem
              • $X\sim P_\theta$: R.V.
              • $\delta(X)$: Decision Procedure
              • $\mathcal{D}$: Decision Space
              • $R(\theta, \delta)$: Risk Function
            • Additional Elements of Bayesian Decision Problem
              • $\theta \sim \pi$: Prior Distribution for parameter $\theta\in\Theta$
              • $r(\pi, \delta)$: Bayes Risk of $\theta$ given prior distribution $\pi$
              • Bayes rule $\delta^*$: Decision procedure that minimizes the Bayes risk
          • Computation
            • Discrete priors \(r(\pi, \delta) = \sum_\theta \pi(\theta)R(\theta, \delta)\)
            • Continuous priors \(r(\pi, \delta) = \int_\Theta \pi(\theta)R(\theta, \delta)d\theta\)
          • Identifying Bayes Procedures
            • Posterior analysis specifies Bayes rules directly
            • Apply Posterior Distribution of $\theta$ given $X$ to minimize risk a posterior.
        • Maximum Risk (minimax approach)
          • Minimax Criterion
            • Prefer $\delta$ to $\delta’$ if \(\sup_{\theta\in\Theta}R(\theta, \delta) < \sup_{\theta\in\Theta} R(\theta, \delta')\)
            • A procedure $\delta^*$ is called minimax if \(\sup_{\theta\in\Theta}R(\theta, \delta^*) = \inf_{\delta\in\mathcal{D}}\sup_{\theta\in\Theta}R(\theta, \delta^*)\)