### 1. Estimators

#### The Basic Statistical Model

As usual, our starting point is a random experiment with a sample space and a probability measure P. In the basic statistical model, we have an observable random variable X taking values in a set S. Recall that in general, X can have quite a complicated structure. For example, if the experiment is to sample n objects from a population and record various measurements of interest, then

X = (X1, X2, ..., Xn)

where Xi is the vector of measurements for the i'th object. The most important special case is when X1, X2, ..., Xn are independent and identically distributed (IID). In this case the n random variables form a random sample of size n from the common distribution

Recall also that a statistic is an observable function of the outcome variable of the random experiment:

W = h(X).

Thus, a statistic is simply a random variable derived from the data variable X, with the assumption that W is also observable. Typically, W is also vector-valued.

#### Parameters

In the general sense, a parameter a is a function of the distribution of X, taking values in a parameter space A. Usually, the distribution of X will have k real parameters of interest, so that a = (a1, a2, ..., ak), and A is a subset of Rk. In many cases, one or more of the parameters are unknown, and must be estimated from the outcome vector X. This is one of the of the most important and basic of all statistical problems, and is the subject of this chapter.

#### Basic Properties of Estimators

Suppose now that we have an unknown real parameter a taking values in a parameter space A R. A real-valued statistic W that is used to estimate a is called, appropriately enough, an estimator of a. Thus, the estimator is a random variable and hence has a distribution, a mean, a variance, and so on. When we actually run the experiment and observe the data, the observed value w (a single number) is the estimate of the parameter a.

The (random) error is difference between the estimator and the parameter:

W - a.

The expected value of the error is known as the bias:

bias(W) = E(W - a)

1. Use basic properties of expected value to show that

bias(W) = E(W) - a.

Thus, the estimator is said to be unbiased if the bias is 0 for all values of a, equivalently if the expected value of the estimator is the parameter being estimated: E(W) = a for a in A.

The quality of the estimator is usually measured by computing the mean square error:

MSE(W) = E[(W - a)2].

2. Use basic properties of expected value and variance to show that

MSE(W) = var(W) + bias2(W).

In particular, if the estimator is unbiased, then the mean square error of W is simply the variance of W.

Ideally, we would like to have unbiased estimators with small mean square error. However, this is not always possible, and Exercise 2 shows the delicate relationship between bias and mean square error. In the next section, we will see an example with two estimators of a parameter that are multiples of each other; one is unbiased, but the other has smaller mean square error.

However, if we have two unbiased estimators of a, denoted U and V, we naturally prefer the one with the smaller variance (mean square error). The relative efficiency of V to U is simply the ratio of the variances:

var(U) / var(V).

#### Asymptotic Properties

Consider the special case where the data variable X has the form

X = (X1, X2, ...)

and where we have a real-valued parameter a of interest. Again, this is the standard situation that occurs when we sample repeatedly from a population; typically, Xi is the vector of measurements for the i'th object in the sample. Thus, for each n, (X1, ..., Xn) are the observation variables for the sample of size n. In this situation, we usually have a general formula that defines an estimator of a for any sample size. Technically, this gives a sequence of estimators of a:

Wn = hn(X1, X2, ..., Xn), n = 1, 2, ...

In this case, we can discuss the asymptotic properties of the estimators as n increases. Most of the definitions are natural generalizations of the ones above.

The sequence of estimators Wn is said to be asymptotically unbiased for a if

bias(Wn) 0 as n for a in A.

3. Show that Wn is asymptotically unbiased if and only if

E(Wn) a as n for a in A.

Suppose that Un and Vn are two sequences of estimators that are asymptotically unbiased for a. The asymptotic relative efficiency of Vn to Un is the following limit, if it exists:

limn [var(Un) / var(Vn)].

Naturally, we expect our estimators to improve, in some sense, as n increases. Specifically, the sequence of estimators Wn is said to be consistent for a if Wn converges to a in probability as n increases:

P[|Wn - a| > r] 0 as n for any r > 0 and any a in A.

4. Suppose that MSE(Wn) 0 as n for any a in A. Show that Wn is consistent for a. Hint: Use Markov's inequality.

The condition in Exercise 4 is known as mean-square consistency. Thus, mean-square consistency implies simple consistency. This is simply a statistical version of the theorem that states that mean-square convergence implies convergence in probability.

#### The Sample Mean and Variance

Suppose that (X1, X2, ..., Xn) is a random sample of size n from a distribution of a real-valued random variable X with mean µ and variance d2. Recall that the sample mean and sample variance, respectively, are defined by

Mn = (1 / n)i = 1, ..., n Xi.

Sn2 = [1 / (n - 1)]i = 1, ..., n (Xi - Mn)2.

The properties of these statistics are studied in detail in the chapter on Random Samples. Here, we will restate some of these properties in the language of estimation.

5. Show or recall that

1. E(Mn) = µ, so Mn is an unbiased estimator of µ.
2. var(Mn) = d2 / n, so Mn is a consistent estimator of µ.

6. In the sample mean experiment, set the sampling distribution to gamma. Increase the sample size with the scroll bar and note graphically and numerically the unbiased and consistent properties. Run the experiment 1000 times updating every 10.

7. Run the normal estimation experiment 1000 times, updating every 10 runs, for several values of the parameters. In each case, compare the empirical bias and mean square error of Mn with the theoretical values.

The consistency of Mn as an estimator of µ is simply the weak law of large numbers. Moreover, there are a number of important special cases of the results in Exercise 5. See the section on Empirical Distributions in the chapter on Random Samples for the details.

• If X = IA, the indicator variable for an event A that has probability p, then the sample mean of Xi, i = 1, 2, ..., n is the relative frequency fn of A. Hence fn is an unbiased and consistent estimator of p.
• If F denotes the distribution function of X, then for fixed x, the empirical distribution function Fn(x) is simply the sample mean for the random sample I{Xi x}, i = 1, 2, ..., n. Hence Fn(x) is an unbiased and consistent estimator of F(x).
• If X is discrete and f denotes the density function of X, then for fixed x, the empirical density function fn(x) is simply the sample mean for the random sample 1{Xi = x}, i = 1, 2, ..., n. Hence fn(x) is an unbiased and consistent estimator of f(x).

8. In matching experiment, the random variable is the number of matches. Run the simulation 1000 times updating every 10 runs and note the apparent convergence of

1. the sample mean to the distribution mean.
2. the sample standard deviation to the distribution standard deviation,
3. the empirical density function to the distribution density function.

In the following problems, we assume that d4 = E[(X - µ)4] is finite.

9. Show or recall that

1. E(Sn2) = d2 so Sn2 is an unbiased estimator of d2.
2. var(Sn2) = (1 / n)[d4 - (n - 3)d4 / (n - 1)] so Sn2 is a consistent estimator of d2.

10. Run the exponential experiment 1000 times with an update frequency of 10. Note the apparent convergence of the sample standard deviation to the distribution standard deviation.

Recall that if µ is known, a natural estimator of d2 is

Wn2 = (1 / n)i = 1, ..., n (Xi - µ)2.

11. Show or recall that

1. E(Wn2) = d2 so Wn2 is an unbiased estimator of d2.
2. var(Wn2) = (1 / n)(d4 - d4)so Wn2 is a consistent estimator of d2.

12. Show that the asymptotic relative efficiency of Sn2 to Wn2 is 1.

13. Run the normal estimation experiment 1000 times, updating every 10 runs, for several values of the parameters. In each case, compare the empirical bias and mean square error of Sn2 and of Wn2 to their theoretical values. Which estimator seems to work better?

The estimators of the mean and variance that we have considered in this section have been natural in a sense. However, for other parameters, it is not clear how to even find a reasonable estimator in the first place. In the next several sections, we will consider the problem of constructing estimators.