Virtual Laboratories > Point Estimation > 1 2 [3] 4 5 6
Suppose again that we have an observable random variable X for an experiment, that takes values in a set S. Suppose also that distribution of X depends on an unknown parameter a, taking values in a parameter space A. Specifically, we will denote the density function of X at x by f(x | a). In general, both X and a are vector valued.
The likelihood function L is the function obtained by reversing the roles of x and a; that is, we view a as the variable and x as the given information (which is precisely the point of view in estimation):
L(a | x) = f(x | a) for a in A and x in S.
In the method of maximum likelihood, we try to find a value u(x) of the parameter a that maximizes L(a | x) for each x in S. If we can do this, then u(X) is called a maximum likelihood estimator of a. The method is intuitively appealing--we try to find the values of the parameters that would have most likely produced the data we in fact observed.
Since the natural logarithm function ln is strictly increasing, the maximum value of L(a | x), if it exists, will occur at the same points as the maximum value of ln[L(a | x)]. This latter function is called the log likelihood function and in many cases is easier to work with than the likelihood function (usually because the density f(x | a) has a product structure).
An important special case is when a = (a1, a2,
..., ak) is a vector of k real parameters, so that A
Rk.
In this case, the maximum likelihood problem is to maximize a function of
several variables. If A is a continuous set, the methods of calculus can be used: if
the maximum value occurs at a point a in the interior of A,
then L(· | x) has a local maximum at a
and therefore
(d/dai)L(a | x) = 0 for i = 1, 2, ..., k.
On the other hand, the maximum value may occur at a boundary point of A, or may not exist at all.
Consider next the case where X = (X1, X2, ..., Xn) is a random sample of size n from a distribution with of a random variable X with density function g(x | a). Then the joint density of X is the product of the marginal densities, so the likelihood function in this special case becomes
L(a | x) = f(x | a) = g(x1 | a)g(x2 | a)···g(xn | a) where x = (x1, x2, ..., xn).
In the following subsections, we will study maximum likelihood estimation in a number of classical cases.
Suppose that we have a coin with unknown probability p of heads. We toss the coin n times and record the sequence of heads and tails. Thus, the data (I1, I2, ..., In) is a random sample of size n from the Bernoulli distribution with success parameter p. Let Xn = I1 + I2 + ··· + In denote the number of heads and Mn = Xn / n the proportion of heads (the sample mean).
1. Suppose that p
varies in (0, 1). Show that the maximum likelihood estimator of p is Mn.
Recall that Mn is also the method of moments estimator of p.
2. Suppose that
the coin is either fair or two-headed, so p varies in {1/2, 1}. Show that the
maximum likelihood estimator of p is as given below, and interpret the result:
Un = 1 if Xn = n; Un = 1/2 if Xn < n.
Exercises 1 and 2 show that the maximum likelihood estimator of a parameter, like the solution to any maximization problem, depends critically on the domain.
3. Show that
4. Show that
5. Show that Un
is uniformly better than Mn on the parameter space {1/2, 1}.
In the following exercises, recall that if (X1, X2, ..., Xn) is a random sample from a distribution with mean µ and variance d2, then the method of moments estimators of µ and d2 are, respectively,
Of course, Mn is the sample mean and Tn2 = (n - 1)Sn2 / n where Sn2 is the sample variance.
6. Suppose that (X1,
X2, ..., Xn) is a random sample from the Poisson distribution with unknown parameter a
> 0. Show that the maximum likelihood estimator of a is Mn.
7. Suppose that (X1,
X2, ..., Xn) is a random sample from the normal distribution with unknown mean µ in R
and variance d2 > 0. Show that the maximum likelihood estimators of
µ and d2 are respectively Mn and Tn2.
8. Suppose that (X1,
X2, ..., Xn) is a random sample from the gamma distribution with known shape parameter k
and unknown scale parameter b > 0. Show that the maximum likelihood estimator
of b is Vn = Mn / k.
9. Run the
gamma estimation experiment 1000 times, updating every 10 runs, for
several values of the shape parameter k and the scale parameter b. In
each case, compare the method of moments estimator Un with the maximum likelihood estimator Vn. Which
estimator seems to work better in terms of mean square error?
10. Suppose that (X1,
X2, ..., Xn) is a random sample from the beta distribution with parameters a > 0
and b = 1. Show that the maximum likelihood estimator of a is
Vn = -n /
j
= 1, ..., n ln(Xj).
11. Run the
beta estimation experiment 1000 times, updating every 10 runs, for
several values of a. In each case, compare the method of moments estimator Un
with the maximum likelihood estimator Vn.
Which estimator seems to work better in terms of mean square error?
12. Suppose that (X1,
X2, ..., Xn) is a random sample from the Pareto
distribution with shape parameter a > 0.
Show that the maximum likelihood estimator of a is
Vn = n /
j
= 1, ..., n ln(Xj).
In this section we will study an estimation problem that is a good source of insight. In a sense, this estimation problem is the continuous analogue of an estimation problem studied in the section on Order Statistics in the chapter Finite Sampling Models.
Suppose that (X1, X2, ..., Xn) is a random sample from the uniform distribution on the interval [0, a], where a > 0 is an unknown parameter.
13. Show that the
method of moments estimator of a is Un = 2Mn.
14. Show that Un
is unbiased.
15. Show that var(Un)
= a2 / 3n, so Un is consistent.
16. Show that the
maximum likelihood estimator of a is X(n) the n'th
order statistic.
17. Show that E[X(n)]
= na / (n + 1), so Vn = (n + 1)X(n)
/ n is unbiased.
18. Show that var[Vn]
= a2 / [n(n + 2)], so Vn is
consistent.
19. Show that the
asymptotic relative efficiency of Vn to Un is
infinite.
The last exercise shows that Vn is a much better estimator than Un; in fact, an estimator such as Vn, whose mean square error decreases on the order of 1 / n2, is called super efficient. Now, having found a really good estimator, let's see if we can find a really bad one. A natural candidate is an estimator based on X(1), the first order statistic.
20. Show that X(1)
has the same distribution as a - X(n).
21. Show that E[X(1)]
= a / (n + 1) and hence Wn = (n + 1)X(1)
is unbiased.
22. Show that var[Wn]
= na2 / (n + 2), so Wn is not even
consistent.
23. Run the
uniform estimation experiment 1000 times, updating every 10 runs, for
several values of a. In each case, compare the empirical bias and mean square
error of the estimators with their theoretical values. Rank the estimators in terms of
empirical mean square error.
Returning to the general setting, suppose now that h is a one-to-one function from the parameter space A onto a set B. We can view b = h(a) as a new parameter taking values in the space B, and it is easy to re-parameterize the joint density function with the new parameter. Thus, let
f1(x | b) = f[x | h-1(b)] for x in S, b in B.
The corresponding likelihood function is
L1(b | x) = L[h-1(b) | x] for b in B and x in S.
24. Suppose that u(x)
in A maximizes L(· | x) for each x
in S. Show that h[u(x)]
in B maximizes L1(· | x) for each x in S.
It follows from Exercise 17 that if U is a maximum likelihood estimator for a, the V = h(U) is a maximum likelihood estimator for b = h(a). This result is known as the invariance property.
25. Suppose that (X1,
X2, ..., Xn) is a random sample from the Poisson
distribution with mean µ, and let p = P(Xi = 0) = e-µ.
Find the maximum likelihood estimator of p in two ways:
If the function h is not one-to-one, the maximum likelihood problem for the new parameter vector b = h(a) is not well-defined, because we cannot parametrize the joint density function in terms of b. However, there is a natural generalization of the maximum likelihood problem in this case. Define
L1(b | x) = max{L[a | x]: a in A, h(a) = b} for b in B and x in S.
26. Suppose again
that u(x) in A maximizes L(· | x)
for each x in S. Show that h[u(x)]
in B maximizes L1(· | x) for each x in S.
The result in the last exercise extends the invariance property to many-to-one transformations of the parameter: if U is a maximum likelihood estimator for a, the V = h(U) is a maximum likelihood estimator for b = h(a).
27. Suppose that (I1,
I2, ..., In) is a random sample of size n
from the Bernoulli distribution with unknown success parameter p in (0, 1). Find
the maximum likelihood estimator of p(1 - p), the variance of
the sampling distribution.
28. Suppose that (X1,
X2, ..., Xn) is a random sample from the normal
distribution with unknown mean µ in R and variance d2 > 0.
Find the maximum likelihood estimator of µ2 + d2.