Virtual Laboratories > Point Estimation > 1 2 3 [4] 5 6

4. Bayes' Estimators


The Method

Suppose again that we have an observable random variable X for an experiment, that takes values in a set S. Suppose also that distribution of X depends on a parameter a taking values in a parameter space A. As before, we will denote the density function of X at x by f(x | a).

In Bayesian analysis, we treat the parameter vector a as a random variable, with a given density function h(a), a in A. The corresponding distribution is called the prior distribution of a and is intended to reflect our knowledge (if any) of the parameter vector, before we gather data.

We then use Bayes' theorem, named for Thomas Bayes, to compute the conditional density function of a given X = x in S:

h(a | x) = f(x | a)h(a) / g(x), for a in A and x in S

where g is the (marginal) density function of X. Recall that for fixed x in S, g(x) can be obtained by integrating (in the continuous case) or summing (in the discrete case) f(x | a)h(a) over a in A. Equivalently, g(x) is simply the normalizing constant for f(x | a)h(a) as a function of a. The conditional distribution of a given X = x is called the posterior distribution, and is an updated distribution, given the information in the data.

If a is a real parameter, the conditional expected value E(a | X) is the Bayes' estimator of a. Recall that E(a | X) is a function of X and, among all functions of X, is closest to a in the mean square sense.

Conjugate Families

In many important special cases, we can find a parametric family of distributions with the following property: If the prior distribution of a belongs to the family then so does the posterior distribution of a given X = x. The family is said to be conjugate for the distribution of X. Conjugate families are nice from a computational point of view, since we can often compute the posterior distribution through a simple formula involving the parameters of the family, without having to use Bayes' theorem directly.

The Bernoulli Distribution

Suppose that we have a coin with an unknown probability p of heads. We toss the coin n times and record the outcome vector I = (I1, I2, ..., In). For a given p, these variables form a random sample from the Bernoulli distribution with parameter p. Let Xn = I1 + I2 + ··· + In denote the number of heads

Suppose now that we give p a prior beta distribution with parameters a and b, where a and b are chosen to reflect our initial information about the coin. For example, if we know nothing about the coin, we might let a = b = 1, so that the prior distribution of p is uniform on (0, 1). On the other hand, if we believe that the coin is biased towards heads with p about 2 / 3, we might let a = 4, b = 2 (so that the mean of the prior distribution is 2 / 3).

Mathematical Exercise 1. Show that the posterior distribution of p given I is beta with parameters a + Xn, b + (n - Xn).

Exercise 1 shows that the beta distribution is conjugate for the Bernoulli distribution. Note also that for the posterior distribution, the first beta parameter is increased by the number of heads and the second beta parameter is increased by the number of tails.

Simulation Exercise 2. In the beta coin experiment, set n = 10, p = 0.7, and set a = b = 1 (the uniform prior). Run the simulation 100 times, updating after each run. Note the shape and location of the posterior density on each run.

Mathematical Exercise 3. Show that Bayes' estimator of p is Un = (Xn + a) / (n + a + b).

Simulation Exercise 4. In the beta coin experiment, set n = 20, p = 0.3, and set a = 4, b = 2. Run the simulation 100 times, updating after each run. Note the estimate of p and the shape and location of the posterior density on each run.

Mathematical Exercise 5. Show that bias(Un | p) = (a - pa - pb) / (n + a + b) and hence that Un is asymptotically unbiased.

Note in Exercise 3 that we cannot choose a and b to make Un unbiased, since such a choice would involve the true value of p, which we do not know.

Simulation Exercise 6. In the beta coin experiment, vary the parameters and note the change in the bias. Now set n = 20, p = 0.8, a = 2, and b = 6. Run the simulation 1000 times, updating every 10 runs. Note the estimate of p and the shape and location of the posterior density on each update. Note the apparent convergence of the empirical bias to the true bias.

Mathematical Exercise 7. Show that the mean square error of Un is as follows, and hence that Un is consistent:

MSE(Un | p) = [p(n - 2a2 - 2ab) + p2(-n + a2 + b2 + 2ab) + a2] / (n + a + b)2.

Simulation Exercise 8. In the beta coin experiment, vary the parameters and note the change in the mean square error. Now set n = 10, p = 0.7, a = 1, and b = 1. Run the simulation 1000 times, updating every 10 runs. Note the estimate of p and the shape and location of the posterior density on each update. Note the apparent convergence of the empirical mean square error to the true mean square error.

Interestingly, we can choose a and b so that Un has mean square error that is independent of p:

Mathematical Exercise 9. Show that if a = b = n1/2 / 2 then MSE(Un | p) = n / [4(n + n1/2)2] for any p.

Simulation Exercise 10. In the beta coin experiment, set n = 36 and a = b = 3. Vary p and note that the mean square error does not change. Now set p = 0.8 and run the simulation 1000 times, updating every 10 runs. Note the estimate of p and the shape and location of the posterior density on each update. Note the apparent convergence of the empirical bias and mean square error to the true values.

Recall that the sample mean Mn = Xn / n (the proportion of heads) is both the method of moments estimator and the maximum likelihood estimator of p, and has mean square error MSE(Mn | p) = p(1 - p) / n.

Mathematical Exercise 11. Sketch the graphs of MSE(Un | p) in Exercise 6 and MSE(Mn | p), as functions of p, on the same set of axes.

Suppose now that the coin is either fair or two-headed, but we don't know which. We give p the prior distribution with density function given as follows, where a in (0, 1) is chosen to reflect our prior knowledge of the probability of heads.

h(1) = a, h(1 / 2) = 1 - a.

Mathematical Exercise 12. Show that the posterior distribution of p given I is as follows. Interpret the result.

  1. h(1 | I) = a / [a + (1 - a) (1 / 2)n] if Xn = n.
  2. h(1 | I) = 0 if Yn < n.
  3. h(1 / 2 | I) = 1 - h(1 | I).

Mathematical Exercise 13. Show that the Bayes' estimator of p is

Un = pn if Xn = n, Un = 1 / 2 if Xn < n,

where pn = [a + (1 - a)(1 / 2)n + 1] / [a + (1 - a) (1 / 2)n].

Mathematical Exercise 14. Show that

  1. E(Un | p = 1) = pn.
  2. E(Un | p = 1 / 2) = (1 / 2)n pn + (1 / 2) [1 - (1 / 2)n].
  3. Un is asympotically unbiased.

Mathematical Exercise 15. Show that

  1. MSE(Un | p = 1) = (pn - 1)2.
  2. MSE(Un | p = 1 / 2) = (1 / 2)n (pn - 1 / 2)2.
  3. Un is consistent.

The Poisson Distribution

Suppose that X = (X1, X2, ..., Xn) is a random sample of size n from the Poisson distribution with parameter a. Moreover, suppose that a has a prior gamma distribution with shape parameter k and scale parameter b. Let

Yn = X1 + X2 + ··· + Xn.

Mathematical Exercise 16. Show that the posterior distribution of a given X is gamma with shape parameter k + Yn and scale parameter b / (nb + 1).

It follows that the gamma distribution is conjugate to the Poisson distribution.

Mathematical Exercise 17. Show that the Bayes estimator of a is Vn = (k + Yn)b / (nb + 1).

Mathematical Exercise 18. Show that bias(Vn | µ) = (kb - a) / (nb + 1) and hence that Vn is asymptotically unbiased.

Note that, as before, we cannot choose k and b to make Vn unbiased.

Mathematical Exercise 19. Show that the mean square error of Vn is as follows, and hence that Vn is consistent:

MSE(Vn | a) = [(nb2 - 2kb)a + a2 + k2b2) / [(nb + 1)2].

The Normal Distribution

Suppose that X = (X1, X2, ..., Xn) is a random sample of size n from the normal distribution with mean µ and variance d2, where µ is unknown and d2 is known. Moreover, suppose that µ has a prior normal distribution with mean a and variance b2, both known of course. Let

Yn = (X1 + X2 + ··· + Xn).

Mathematical Exercise 20. Show that the posterior distribution of µ given X is normal with mean and variance given below.

  1. E(µ | X) = (Ynb2 + ad2) / (d2 + nb2)
  2. var(µ | X) = d2b2 / (d2 + nb2)

Therefore, the normal distribution is conjugate for the normal distribution with unknown mean and known variance. Moreover, it follows that the Bayes' estimator of µ is

Un = (Ynb2 + ad2) / (d2 + nb2).

Mathematical Exercise 21. Show that bias(Un | µ) = d2(a - µ) / (d2 + nb2) and hence that Un is asymptotically unbiased.

Mathematical Exercise 22. Show that MSE(Un | µ) = [nd2b4 + d4(a - µ)2] / (d2 + nb2)2 and hence that Un is consistent.