One of the main differences between topics of probability and topics in statistics is that in statistics we have some task in mind. While a probability model \(P_X(X \mid \theta)\) is an object of study when discussing probability, in statistics we usually want to do something with it.

The first example that we will consider is to estimate the true, unknown value \(\theta^*\) given some dataset \(\{x_i\}_{i=1}^N\) assuming that the data were drawn from \(X_i \sim p_X(X|\theta^*)\).


An estimator \(\hat{\theta}(x_1, \dots, x_N)\) is a function of the data (that aims to estimate the true, unknown value \(\theta^*\) assuming that the data were drawn from \(X_i \sim p_X(X|\theta^*)\).

There are several concrete estimators for different quantities, but this is an abstract definition of what is meant by an estimator. It is useful to think of the estimator as a procedure that you apply to the data, and then you can ask about the properties of a given procedure.


These closely related terms have slightly different meanings:

  • The estimand refers to the parameter \(\theta\) being estimated.

  • The estimator refers to the function or procedure \(\hat{\theta}(x_1, \dots, x_N)\)

  • The specific value that an estimator takes (returns) for specific data is known as the estimate.

We already introduced two estimators when studying Transformation properties of the likelihood and posterior:

  • The maximum likelihood estimator: \(\hat{\theta}_\textrm{MLE} := \textrm{argmax}_\theta p(X=x \mid \theta)\)

  • The maximum a posteriori estimator: \(\hat{\theta}_{MAP} := \textrm{argmax}_\theta p(\theta \mid X=x)\)

Note both of these estimators are defined by procedures that you apply once you have specific data.


The estimate \(\hat{\theta}(X_1, \dots, X_N)\) depends on the random variables \(X_i\), so it is itself a random variable (unlike the parameter \(\theta\)). Often the estimate is denoted \(\hat{\theta}\) and the dependence on the data is implicit. Subscripts are often used to indicate which estimator is being used, eg. the maximum likelihood estimator \(\hat{\theta}_\textrm{MLE}\) and the maximum a posteriori estimator \(\hat{\theta}_\textrm{MAP}\).


It is often useful to consider two straw man estimators:

  • A constant estimator: \(\hat{\theta}_\textrm{const} = \theta_0\) for \(\theta_0 \in \Theta\)

  • A random estimator: \(\hat{\theta}_\textrm{random} =\) some random value for \(\theta\) independent of the data Neither of these are useful estimators, but they can be used to help clarify your thinking due to their obvious properties.