Virtual Laboratories > Point Estimation > 1 2 [3] 4 5 6

3. Maximum Likelihood


The General Method

Suppose again that we have an observable random variable X for an experiment, that takes values in a set S. Suppose also that distribution of X depends on an unknown parameter a, taking values in a parameter space A. Specifically, we will denote the density function of X at x by f(x | a). In general, both X and a are vector valued.

The likelihood function L is the function obtained by reversing the roles of x and a; that is, we view a as the variable and x as the given information (which is precisely the point of view in estimation):

L(a | x) = f(x | a) for a in A and x in S.

In the method of maximum likelihood, we try to find a value u(x) of the parameter a that maximizes L(a | x) for each x in S. If we can do this, then u(X) is called a maximum likelihood estimator of a. The method is intuitively appealing--we try to find the values of the parameters that would have most likely produced the data we in fact observed.

Since the natural logarithm function ln is strictly increasing, the maximum value of L(a | x), if it exists, will occur at the same points as the maximum value of ln[L(a | x)]. This latter function is called the log likelihood function and in many cases is easier to work with than the likelihood function (usually because the density f(x | a) has a product structure).

Special Cases

An important special case is when a = (a1, a2, ..., ak) is a vector of k real parameters, so that A subset Rk. In this case, the maximum likelihood problem is to maximize a function of several variables. If A is a continuous set, the methods of calculus can be used: if the maximum value occurs at a point a in the interior of A, then L(· | x) has a local maximum at a and therefore

(d/dai)L(a | x) = 0 for i = 1, 2, ..., k.

On the other hand, the maximum value may occur at a boundary point of A, or may not exist at all.

Consider next the case where X = (X1, X2, ..., Xn) is a random sample of size n from a distribution with of a random variable X with density function g(x | a). Then the joint density of X is the product of the marginal densities, so the likelihood function in this special case becomes

L(a | x) = f(x | a) = g(x1 | a)g(x2 | a)···g(xn | a) where x = (x1, x2, ..., xn).

In the following subsections, we will study maximum likelihood estimation in a number of classical cases.

The Bernoulli Distribution

Suppose that we have a coin with unknown probability p of heads. We toss the coin n times and record the sequence of heads and tails. Thus, the data (I1, I2, ..., In) is a random sample of size n from the Bernoulli distribution with success parameter p. Let Xn = I1 + I2 + ··· + In denote the number of heads and Mn = Xn / n the proportion of heads (the sample mean).

Mathematical Exercise 1. Suppose that p varies in (0, 1). Show that the maximum likelihood estimator of p is Mn.

Recall that Mn is also the method of moments estimator of p.

Mathematical Exercise 2. Suppose that the coin is either fair or two-headed, so p varies in {1/2, 1}. Show that the maximum likelihood estimator of p is as given below, and interpret the result:

Un = 1 if Xn = n; Un = 1/2 if Xn < n.

Exercises 1 and 2 show that the maximum likelihood estimator of a parameter, like the solution to any maximization problem, depends critically on the domain.

Mathematical Exercise 3. Show that

  1. E(Un) = 1 if p = 1, E(Un) = 1/2 + (1/2)n + 1 if p = 1/2.
  2. Un is biased, but asymptotically unbiased.

Mathematical Exercise 4. Show that

  1. MSE(Un) = 0 if p = 1, MSE(Un) = (1/2)n + 2 if p = 1/2.
  2. Un is consistent.

Mathematical Exercise 5. Show that Un is uniformly better than Mn on the parameter space {1/2, 1}.

Other Basic Distributions

In the following exercises, recall that if (X1, X2, ..., Xn) is a random sample from a distribution with mean µ and variance d2, then the method of moments estimators of µ and d2 are, respectively,

  1. Mn = (1 / n)sumj = 1, ..., n Xj.
  2. Tn2 = (1 / n)sumj = 1, ..., n (Xj - Mn)2

Of course, Mn is the sample mean and Tn2 = (n - 1)Sn2 / n where Sn2 is the sample variance.

Mathematical Exercise 6. Suppose that (X1, X2, ..., Xn) is a random sample from the Poisson distribution with unknown parameter a > 0. Show that the maximum likelihood estimator of a is Mn.

Mathematical Exercise 7. Suppose that (X1, X2, ..., Xn) is a random sample from the normal distribution with unknown mean µ in R and variance d2 > 0. Show that the maximum likelihood estimators of µ and d2 are respectively Mn and Tn2.

Mathematical Exercise 8. Suppose that (X1, X2, ..., Xn) is a random sample from the gamma distribution with known shape parameter k and unknown scale parameter b > 0. Show that the maximum likelihood estimator of b is Vn = Mn / k.

Simulation Exercise 9. Run the gamma estimation experiment 1000 times, updating every 10 runs, for several values of the shape parameter k and the scale parameter b. In each case, compare the method of moments estimator Un with the maximum likelihood estimator Vn. Which estimator seems to work better in terms of mean square error?

Mathematical Exercise 10. Suppose that (X1, X2, ..., Xn) is a random sample from the beta distribution with parameters a > 0 and b = 1. Show that the maximum likelihood estimator of a is

Vn = -n / sumj = 1, ..., n ln(Xj).

Simulation Exercise 11. Run the beta estimation experiment 1000 times, updating every 10 runs, for several values of a. In each case, compare the method of moments estimator Un with the maximum likelihood estimator Vn. Which estimator seems to work better in terms of mean square error?

Mathematical Exercise 12. Suppose that (X1, X2, ..., Xn) is a random sample from the Pareto distribution with shape parameter a > 0. Show that the maximum likelihood estimator of a is

Vn = n / sumj = 1, ..., n ln(Xj).

The Uniform Distribution on [0, a]

In this section we will study an estimation problem that is a good source of insight. In a sense, this estimation problem is the continuous analogue of an estimation problem studied in the section on Order Statistics in the chapter Finite Sampling Models.

Suppose that (X1, X2, ..., Xn) is a random sample from the uniform distribution on the interval [0, a], where a > 0 is an unknown parameter.

Mathematical Exercise 13. Show that the method of moments estimator of a is Un = 2Mn.

Mathematical Exercise 14. Show that Un is unbiased.

Mathematical Exercise 15. Show that var(Un) = a2 / 3n, so Un is consistent.

Mathematical Exercise 16. Show that the maximum likelihood estimator of a is X(n) the n'th order statistic.

Mathematical Exercise 17. Show that E[X(n)] = na / (n + 1), so Vn = (n + 1)X(n) / n is unbiased.

Mathematical Exercise 18. Show that var[Vn] = a2 / [n(n + 2)], so Vn is consistent.

Mathematical Exercise 19. Show that the asymptotic relative efficiency of Vn to Un is infinite.

The last exercise shows that Vn is a much better estimator than Un; in fact, an estimator such as Vn, whose mean square error decreases on the order of 1 / n2, is called super efficient. Now, having found a really good estimator, let's see if we can find a really bad one. A natural candidate is an estimator based on X(1), the first order statistic.

Mathematical Exercise 20. Show that X(1) has the same distribution as a - X(n).

Mathematical Exercise 21. Show that E[X(1)] = a / (n + 1) and hence Wn = (n + 1)X(1) is unbiased.

Mathematical Exercise 22. Show that var[Wn] = na2 / (n + 2), so Wn is not even consistent.

Simulation Exercise 23. Run the uniform estimation experiment 1000 times, updating every 10 runs, for several values of a. In each case, compare the empirical bias and mean square error of the estimators with their theoretical values. Rank the estimators in terms of empirical mean square error.

The Invariance Property

Returning to the general setting, suppose now that h is a one-to-one function from the parameter space A onto a set B. We can view b = h(a) as a new parameter taking values in the space B, and it is easy to re-parameterize the joint density function with the new parameter. Thus, let

f1(x | b) = f[x | h-1(b)] for x in S, b in B.

The corresponding likelihood function is

L1(b | x) = L[h-1(b) | x] for b in B and x in S.

Mathematical Exercise 24. Suppose that u(x) in A maximizes L(· | x) for each x in S. Show that h[u(x)] in B maximizes L1(· | x) for each x in S.

It follows from Exercise 17 that if U is a maximum likelihood estimator for a, the V = h(U) is a maximum likelihood estimator for b = h(a). This result is known as the invariance property.

Mathematical Exercise 25. Suppose that (X1, X2, ..., Xn) is a random sample from the Poisson distribution with mean µ, and let p = P(Xi = 0) = e. Find the maximum likelihood estimator of p in two ways:

  1. Directly, by finding the likelihood function corresponding to the parameter p.
  2. By using the result of Exercise 2 and the invariance property.

If the function h is not one-to-one, the maximum likelihood problem for the new parameter vector b = h(a) is not well-defined, because we cannot parametrize the joint density function in terms of b. However, there is a natural generalization of the maximum likelihood problem in this case. Define

L1(b | x) = max{L[a | x]: a in A, h(a) = b} for b in B and x in S.

Mathematical Exercise 26. Suppose again that u(x) in A maximizes L(· | x) for each x in S. Show that h[u(x)] in B maximizes L1(· | x) for each x in S.

The result in the last exercise extends the invariance property to many-to-one transformations of the parameter: if U is a maximum likelihood estimator for a, the V = h(U) is a maximum likelihood estimator for b = h(a).

Mathematical Exercise 27. Suppose that (I1, I2, ..., In) is a random sample of size n from the Bernoulli distribution with unknown success parameter p in (0, 1). Find the maximum likelihood estimator of p(1 - p), the variance of the sampling distribution.

Mathematical Exercise 28. Suppose that (X1, X2, ..., Xn) is a random sample from the normal distribution with unknown mean µ in R and variance d2 > 0. Find the maximum likelihood estimator of µ2 + d2.