Virtual Laboratories > Point Estimation > 1 2 3 4 [5] 6
Consider again the basic statistical model, in which we have a random experiment that results in an observable random variable X taking values in a set S. Once again, the experiment is typically to sample n objects from a population and record a vector of measurements for each item. In this case, X has the form
X
= (X1, X2, ..., Xn).where Xi in the vector of measurements for the i'th item.
Suppose that a is a real parameter of the distribution of X,
taking values in a parameter space A
R. Let f(· | a) denote the probability
density function of X for a
A. Note that the expected value, variance,
and covariance operators also depend on a,
although we will suppress this to keep the notation from becoming to unwieldy.
Finally, let Da denote the derivative operator with respect
to a.
Suppose now that b = b(a) is a parameter of interest. In this section we will consider the general problem of finding the best estimator of b(a) among a a given class of unbiased estimators. Recall that if U is an unbiased estimator of b(a), then var(U) is the mean square error. Thus, if U and V are unbiased estimators of b(a) and
var(U )
var(V) for all a
A.
Then U is a uniformly better estimator than V. On the other hand, it may be the case that U has smaller variance for some values of a while V has smaller variance for other values of a. If U is uniformly better than any other unbiased estimator of b(a), then U is a Uniformly Minimum Variance Unbiased Estimator (UMVUE).
In this section, we will show that under mild conditions, there is a lower
bound on the variance of any unbiased estimator of the parameter b(a).
Thus, if we can find an estimator that achieves this lower bound for all a
A, then the estimator must be an UMVUE.
The assumption that we must make is that for any function h from S into R
with E[|h(X)|]
<
,
Da E[h(X)] = E{h(X) Da ln[f(X | a)]}.
1.
Show that this condition is equivalent to the assumption that the derivative operator Da
can be interchanged with the expected value operator E.
Generally speaking, the fundamental assumption will be satisfied if f(x | a) is differentiable as a function of a, with a derivative that is jointly continuous in x and a, and if the the support set {x: f(x | a) > 0} does not depend on a.
2.
Show that E{Da ln[f(X | a)]}
= 0. Hint: Use the basic condition with h(x) =
1 for x in S.
Now let h be a function that satisfies the basic condition.
3.
Show that cov{h(X), Da
ln[f(X | a)]} = Da E[h(X)].
Hint: First note that the covariance is simply the expected value of the product
of the variables, since the second variable has mean 0 by the previous exercise.
Then simply use the basic condition.
4.
Show that var{Da ln[f(X | a)]}
= E{[Da ln[f(X | a)]]2}.
Hint: The variable has mean 0.
5.
Finally, use the Cauchy-Scharwtz inequality to establish the Cramer-Rao lower bound:
var[h(X)]
{Da E[h(X)]}2
/ E{[Da ln[f(X | a)]]2}.
6.
Suppose that X = (X1, X2, ..., Xn)
is a random sample of size n from the
distribution of a random variable X having density function g.
Show that
var[h(X)]
{Da E[h(X)]}2
/ n E{[Da ln[g(X | a)]]2}.
Hint: The joint density is the product of the marginal densities. Use log properties, independence, and Exercise 2.
Now suppose that b(a) is a parameter of interest and h(X) is an unbiased estimator of b(a).
7.
Use the general Cramer-Rao lower bound to show that
var[h(X)]
{Da b(a)}2
/ E{[Da ln[f(X | a)]]2}.
8.
Show that equality holds in 7 if and only if
h(x) - b(a) = u(a)Da ln[f(x | a)] for all x
for some function u(a). Hint: Recall that equality holds in the Cauchy-Schwartz inequality if and only if the random variables are linear transformations of each other. Recall also that Da ln[f(X | a)] has mean 0.
9.
Suppose that X = (X1, X2, ..., Xn)
is a random sample of size n from the distribution of a random variable
X having density function g. Show that
var[h(X)]
{Da b(a)}2
/ n E{[Da ln[g(X | a)]]2}.
The quantity E{[Da ln[f(X | a)]]2} that occurs in the denominator of the lower bounds of Exercises 5 and 7 is called the Fisher Information Number of X, named after Sir Ronald Fisher.
The following exercises gives an alternate version for the expression in Exercises 7 and 8, that is usually computationally better.
10.
Show that if the appropriate derivatives exist and if the appropriate interchanges are
permissible then
E{[Da ln[g(X | a)]]2} = -E{Da2 ln[g(X | a)]}.
Suppose that (I1, I2, ..., In) is a random sample of size n from the Bernoulli distribution with parameter p. The basic assumption is satisfied.
11.
Show that p(1 - p) / n is the CR lower
bound for the variance of unbiased estimators of p.
12.
Show that the sample mean (or
equivalently the sample proportion) Mn
attains the CR bound and hence is an UMVUE of p.
Suppose that (X1, X2, ..., Xn) is a random sample of size n from the Poisson distribution with parameter a. The basic assumption is satisfied.
13.
Show that a / n is the CR lower bound for the variance of
unbiased estimators of a.
14.
Show that the sample mean Mn attains the CR bound and hence is an
UMVUE of a.
Suppose that (X1, X2, ..., Xn) is a random sample of size n from the normal distribution with mean µ and variance d2. The basic assumption is satisfied with respect to µ and with respect to d2. Recall also that E[(X - µ)4] = 3d4.
15.
Show that d2 / n is the CR lower bound for the
variance of unbiased estimators of µ.
16.
Show that the sample mean Mn attains the CR bound and hence is an
UMVUE of µ.
17.
Show that 2d4 / n is the CR lower bound for the
variance of any unbiased estimator d2.
18.
Show (or recall) that the sample variance S2
has variance 2d4 / (n - 1) and hence does
not attain the CR lower bound in Exercise 17.
19.
Show that if µ is known, then the statistic below attains the CR lower bound and hence is
an UMVUE of d2:
W2 = (1 / n)
i
= 1, ..., n (Xi - µ)2.
20.
Show that if µ is unknown, no estimator of d2 attains the CR
lower bound.
Suppose that (X1, X2, ..., Xn) is a random sample of size n from the gamma distribution with scale parameter b and shape parameter k. The basic assumption is satisfied with respect to b.
21.
Show that b2 / nk is the CR lower bound for the
variance of unbiased estimators of b.
22.
Show if k is known, then Mn / k attains
the CR bound and hence is an UMVUE of b.
Suppose that (X1, X2, ..., Xn) is a random sample of size n from the uniform distribution on (0, a).
23.
Show that the fundamental assumption is not satisfied.
24.
Show that the CR lower bound for the variance of unbiased estimators of a is a2 / n.
25.
Show (or recall) that [(n + 1)
/ n]X(n)
is unbiased and has variance a2 / n(n + 2),
which is smaller than the CR bound in the previous exercise.
The reason that the basic assumption is not satisfied is that the support set {x: f(x | a) > 0} depends on the parameter a.
We now consider a somewhat specialized problem, but one that fits the general theme of this section. Suppose that the X1, X2, ..., Xn are observable real-valued random variables that are uncorrelated and have the same unknown mean µ, but possibly different standard deviations. Let di = sd(Xi) for i = 1, 2, ..., n. We will consider estimators of µ that are linear functions of the outcome variables:
Y =
i
= 1, ..., n ciXi where c1, ..., cn
are to be determined.
26.
Show that Y is unbiased if and only if
i
= 1, ..., n ci = 1.
27.
Compute the variance of Y in terms of and c1, c2,
..., cn and d1, d2, ..., dn.
28.
Use Lagrange multipliers to show that the variance is minimized, subject to the unbiased
constraint when
cj = (1 / dj2) /
i
= 1, ..., n (1 / di2) for j =
1, 2, ..., n.
This exercise shows how to construct the Best Linear Unbiased Estimator (BLUE) of µ, assuming that d1, d2, ..., dn are known.
Suppose now that di = d for each i, so that the outcome variables have the same standard deviation. In particular, this would be the case if the outcome variables form a random sample of size n from a distribution with mean µ and standard deviation d.
29.
Show that in this case the variance is minimized when, ci =
1 / n for each i and hence Y is the sample mean.
This exercise shows that the sample mean Mn is the best linear unbiased estimator of µ when the standard deviations are the same, and that moreover, we do not need to know the value of the standard deviation.