Virtual Laboratories > Point Estimation > 1 2 3 [4] 5 6
Suppose again that we have an observable random variable X for an experiment, that takes values in a set S. Suppose also that distribution of X depends on a parameter a taking values in a parameter space A. As before, we will denote the density function of X at x by f(x | a).
In Bayesian analysis, we treat the parameter vector a as a random variable, with a given density function h(a), a in A. The corresponding distribution is called the prior distribution of a and is intended to reflect our knowledge (if any) of the parameter vector, before we gather data.
We then use Bayes' theorem, named for Thomas Bayes, to compute the conditional density function of a given X = x in S:
h(a | x) = f(x | a)h(a) / g(x), for a in A and x in S
where g is the (marginal) density function of X. Recall that for fixed x in S, g(x) can be obtained by integrating (in the continuous case) or summing (in the discrete case) f(x | a)h(a) over a in A. Equivalently, g(x) is simply the normalizing constant for f(x | a)h(a) as a function of a. The conditional distribution of a given X = x is called the posterior distribution, and is an updated distribution, given the information in the data.
If a is a real parameter, the conditional expected value E(a | X) is the Bayes' estimator of a. Recall that E(a | X) is a function of X and, among all functions of X, is closest to a in the mean square sense.
In many important special cases, we can find a parametric family of distributions with the following property: If the prior distribution of a belongs to the family then so does the posterior distribution of a given X = x. The family is said to be conjugate for the distribution of X. Conjugate families are nice from a computational point of view, since we can often compute the posterior distribution through a simple formula involving the parameters of the family, without having to use Bayes' theorem directly.
Suppose that we have a coin with an unknown probability p of heads. We toss the coin n times and record the outcome vector I = (I1, I2, ..., In). For a given p, these variables form a random sample from the Bernoulli distribution with parameter p. Let Xn = I1 + I2 + ··· + In denote the number of heads
Suppose now that we give p a prior beta distribution with parameters a and b, where a and b are chosen to reflect our initial information about the coin. For example, if we know nothing about the coin, we might let a = b = 1, so that the prior distribution of p is uniform on (0, 1). On the other hand, if we believe that the coin is biased towards heads with p about 2 / 3, we might let a = 4, b = 2 (so that the mean of the prior distribution is 2 / 3).
1. Show that the
posterior distribution of p given I is beta with
parameters a + Xn, b + (n - Xn).
Exercise 1 shows that the beta distribution is conjugate for the Bernoulli distribution. Note also that for the posterior distribution, the first beta parameter is increased by the number of heads and the second beta parameter is increased by the number of tails.
2. In the
beta coin
experiment, set n = 10, p = 0.7, and set a
= b = 1 (the uniform prior). Run the simulation 100 times, updating after each
run. Note the shape and location of the posterior density on each run.
3. Show that
Bayes' estimator of p is Un = (Xn + a)
/ (n + a + b).
4. In the
beta coin
experiment, set n = 20, p = 0.3, and set a
= 4, b = 2. Run the simulation 100 times, updating after each run. Note the
estimate of p and the shape and location of the posterior density on each run.
5. Show that bias(Un
| p) = (a - pa - pb) / (n + a + b)
and hence that Un is asymptotically unbiased.
Note in Exercise 3 that we cannot choose a and b to make Un unbiased, since such a choice would involve the true value of p, which we do not know.
6. In the
beta coin
experiment, vary the parameters and note the change in the
bias. Now set n = 20, p = 0.8, a = 2, and b = 6. Run
the simulation 1000 times, updating every 10 runs. Note the estimate of p and the
shape and location of the posterior density on each update. Note the apparent convergence
of the empirical bias to the true bias.
7. Show that the
mean square error of Un is as follows, and hence that Un
is consistent:
MSE(Un | p) = [p(n - 2a2 - 2ab) + p2(-n + a2 + b2 + 2ab) + a2] / (n + a + b)2.
8. In the
beta coin
experiment, vary the parameters and note the change in the
mean square error. Now set n = 10, p = 0.7, a = 1, and b
= 1. Run the simulation 1000 times, updating every 10 runs. Note the estimate of p
and the shape and location of the posterior density on each update. Note the apparent
convergence of the empirical mean square error to the true mean square error.
Interestingly, we can choose a and b so that Un has mean square error that is independent of p:
9. Show that if a
= b = n1/2 / 2 then MSE(Un | p)
= n / [4(n + n1/2)2] for any p.
10. In the
beta coin
experiment, set n = 36 and a = b =
3. Vary p and note that the mean square error does not change. Now set p
= 0.8 and run the simulation 1000 times, updating every 10 runs. Note the estimate of p
and the shape and location of the posterior density on each update. Note the apparent
convergence of the empirical bias and mean square error to the true values.
Recall that the sample mean Mn = Xn / n (the proportion of heads) is both the method of moments estimator and the maximum likelihood estimator of p, and has mean square error MSE(Mn | p) = p(1 - p) / n.
11. Sketch
the graphs of MSE(Un | p) in Exercise 6 and MSE(Mn
| p), as functions of p, on the same set of axes.
Suppose now that the coin is either fair or two-headed, but we don't know which. We give p the prior distribution with density function given as follows, where a in (0, 1) is chosen to reflect our prior knowledge of the probability of heads.
h(1) = a, h(1 / 2) = 1 - a.
12. Show that the
posterior distribution of p given I is as follows.
Interpret the result.
13. Show that the
Bayes' estimator of p is
Un = pn if Xn = n, Un = 1 / 2 if Xn < n,
where pn = [a + (1 - a)(1 / 2)n + 1] / [a + (1 - a) (1 / 2)n].
14. Show that
15. Show that
Suppose that X = (X1, X2, ..., Xn) is a random sample of size n from the Poisson distribution with parameter a. Moreover, suppose that a has a prior gamma distribution with shape parameter k and scale parameter b. Let
Yn = X1 + X2 + ··· + Xn.
16. Show that the
posterior distribution of a given X is gamma with shape
parameter k + Yn and scale parameter b / (nb + 1).
It follows that the gamma distribution is conjugate to the Poisson distribution.
17. Show that the
Bayes estimator of a is Vn = (k + Yn)b
/ (nb + 1).
18. Show that
bias(Vn | µ) = (kb - a) / (nb + 1)
and hence that Vn is asymptotically unbiased.
Note that, as before, we cannot choose k and b to make Vn unbiased.
19. Show that the
mean square error of Vn is as follows, and hence that Vn
is consistent:
MSE(Vn | a) = [(nb2 - 2kb)a + a2 + k2b2) / [(nb + 1)2].
Suppose that X = (X1, X2, ..., Xn) is a random sample of size n from the normal distribution with mean µ and variance d2, where µ is unknown and d2 is known. Moreover, suppose that µ has a prior normal distribution with mean a and variance b2, both known of course. Let
Yn = (X1 + X2 + ··· + Xn).
20. Show that the
posterior distribution of µ given X is normal with mean and
variance given below.
Therefore, the normal distribution is conjugate for the normal distribution with unknown mean and known variance. Moreover, it follows that the Bayes' estimator of µ is
Un = (Ynb2 + ad2) / (d2 + nb2).
21. Show that
bias(Un | µ) = d2(a - µ) / (d2
+ nb2) and hence that Un is asymptotically
unbiased.
22. Show that MSE(Un
| µ) = [nd2b4 + d4(a
- µ)2] / (d2 + nb2)2
and hence that Un is consistent.