Statistical decision theory

Work in progress, initially just copying over from Wikipedia article: Admissible decision rule

Define sets \(\Theta\), \({\mathcal {X}}\), and \({\mathcal {A}}\), where

  • \(\Theta\) are the states of nature,

  • \({\mathcal {X}}\) the possible observations, and

  • \({\mathcal {A}}\) the actions that may be taken.

An observation \(x\in {\mathcal {X}}\) is distributed as \(F(x\mid \theta )\) and therefore provides evidence about the state of nature \(\theta \in \Theta\).

A decision rule is a function \(\delta :{{\mathcal {X}}}\rightarrow {{\mathcal {A}}}\), where upon observing \(x\in {\mathcal {X}}\), we choose to take action \(\delta (x)\in {\mathcal {A}}\).

Also define a loss function \(L:\Theta \times {\mathcal {A}}\rightarrow {\mathbb {R}}\), which specifies the loss we would incur by taking action \(a\in {\mathcal {A}}\) when the true state of nature is \(\theta \in \Theta\). Usually we will take this action after observing data \(x\in {\mathcal {X}}\), so that the loss will be \(L(\theta ,\delta (x))\). (It is possible though unconventional to recast the following definitions in terms of a utility function, which is the negative of the loss.)

Define the risk function as the expectation \(R(\theta ,\delta )=\operatorname {E}_{{F(x\mid \theta )}}[{L(\theta ,\delta (x))]}.\,\!\)

Whether a decision rule \(\delta\,\!\) has low risk depends on the true state of nature \(\theta\). A decision rule \(\delta ^{*}\) dominates a decision rule \(\delta\) if and only if \(R(\theta ,\delta ^{*})\leq R(\theta ,\delta )\) for all \(\theta\), and the inequality is strict for some \(\theta\).

Bayes rules:

Let \(\pi (\theta )\) be a probability distribution on the states of nature. From a Bayesian point of view, we would regard it as a prior distribution. That is, it is our believed probability distribution on the states of nature, prior to observing data. For a frequentist, it is merely a function on \(\Theta\) with no such special interpretation. The Bayes risk of the decision rule \(\delta\) with respect to \(\pi (\theta )\) is the expectation

(32)\[\begin{equation} r(\pi ,\delta )=\operatorname {E}_{{\pi (\theta )}}[R(\theta ,\delta )]. \end{equation}\]

A decision rule \(\delta\) that minimizes \(r(\pi ,\delta )\) is called a Bayes rule with respect to \(\pi (\theta )\). There may be more than one such Bayes rule.

Generalized Bayes rules:

In the Bayesian approach to decision theory, the observed \(x\) is considered fixed. Whereas the frequentist approach (i.e., risk) averages over possible samples \(x\in {\mathcal {X}}\) the Bayesian would fix the observed sample \(x\) and average over hypotheses \(\theta \in \Theta\). Thus, the Bayesian approach is to consider for our observed \(x\) the expected loss.

(33)\[\begin{equation} \rho (\pi ,\delta \mid x)=\operatorname {E}_{{\pi (\theta \mid x)}}[L(\theta ,\delta (x))] \end{equation}\]

where the expectation is over the posterior of \(\theta\) given \(x\) (obtained from \(\pi (\theta )\) and \(F(x\mid \theta )\) using Bayes’ theorem).

Having made explicit the expected loss for each given \(x\) separately, we can define a decision rule \(\delta\) by specifying for each \(x\) an action \(\delta (x)\) that minimizes the expected loss. This is known as a generalized Bayes rule with respect to \(\pi (\theta )\). There may be more than one generalized Bayes rule, since there may be multiple choices of \(\delta (x)\) that achieve the same expected loss.

According to the complete class theorems, under mild conditions every admissible rule is a (generalized) Bayes rule (with respect to \textit{some} prior \(\pi (\theta )\) —- possibly an improper one -— that favors distributions \(\theta\) where that rule achieves low risk). Thus, in frequentist decision theory it is sufficient to consider only (generalized) Bayes rules.