Virtual Laboratories > Bernoulli Trials > 1 [2] 3 4 5 6 7

2. The Binomial Distribution


Suppose that our random experiment is to perform Bernoulli trials I1, I2, .... In this section we will study the random variable Xn that gives the number of successes in the first n trials. This variable has a simple expression in terms of the indicator variables:

Mathematical Exercise 1. Show that Xn = I1 + I2 + ··· + In.

The Density Function

Mathematical Exercise 2. Suppose that K N = {1, 2, ..., n} and #(K) = k. Use the assumptions of Bernoulli trials to show that

P(Ij = 1 for j K and Ij = 0 for j N - K) = pk(1 - p)n -k.

Recall that the number of subsets of size k from a set of size n is the binomial coefficient

C(n, k). = n!/[k!(n - k)!}

Mathematical Exercise 3. Use Exercise 2 and basic properties of probability to show that

P(Xn = k) = C(n, k)pk(1 - p)n-k for k = 0, 1, ..., n.

The distribution with this density function is known as the binomial distribution with parameters n and p. The binomial family of distributions is one of the most important in probability.

Simulation Exercise 4. In the binomial coin experiment, vary n and p with the scrollbars, and note the shape and location of the density function. Now with n = 10 and p = 0.7, run the simulation 1000 times, updating every 10 runs. Note the apparent convergence of the relative frequency function to the density function.

Mathematical Exercise 5. A fair die is tossed 5 times. Give explicitly the density function of the number of aces.

Mathematical Exercise 6. A student takes a multiple choice test with 10 questions, each with 4 choices. If the student blindly guesses, find the probability that he will get at least 5 questions correct.

Mathematical Exercise 7. Use the binomial theorem to show that the binomial density function really is a (discrete) density function.

Mathematical Exercise 8. Show that

  1. P(Xn = k) > P(Xn = k - 1) if and only if k < (n + 1)p.
  2. P(Xn = k) = P(Xn = k - 1) if and only if (n + 1)p is an integer between 1 and n, and k = (n + 1)p

Thus, the density function at first increases and then decreases, reaching its largest value at floor[(n + 1)p]; this integer is the mode of the distribution. (Recall that floor(x) is the largest integer that is not greater than x). In the case that m = (n + 1)p is an integer between 1 and n, there are two consecutive modes, at m - 1 and m. In any event, the shape of the binomial distribution is unimodal.

Mathematical Exercise 9. Suppose that U is a random variable having the binomial distribution with parameters n and p. Show that n - U has the binomial distribution with parameters n and 1 - p.

  1. Give a probabilistic proof, based on Bernoulli trials
  2. Give an analytic proof, based on density functions

Famous Problems

In 1693, Samuel Pepys asked Isaac Newton whether it is more likely to get at least one ace in 6 rolls of a die or at least two aces in 12 rolls of a die. This problems is known a Pepys' problem; naturally, Pepys had fair dice in mind.

Simulation Exercise 10. Guess the answer to Pepys' problem based on empirical data. With fair dice and n = 6, run the simulation of the dice experiment 500 times and compute the relative frequency of at least one ace. Now with n = 12, run the simulation 500 times and compute the relative frequency of at least two aces. Compare the results.

Mathematical Exercise 11. Solve Pepys' problem using the binomial distribution.

Mathematical Exercise 12. Which is more likely: at least one ace with 4 throws of a fair die or at least one double ace in 24 throws of two fair dice? This is known as DeMere's problem, named after Chevalier De Mere

Moments

We will compute the mean and variance of the binomial distribution several different ways. The method using indicator variables is the best.

Mathematical Exercise 13. Use Exercise 1 and basic properties of expected value to show that

E(Xn) = np.

This makes intuitive sense, since p should be approximately the proportion of successes in a large number of trials.

Mathematical Exercise 14. Compute the mean using the density function.

Mathematical Exercise 15. Use Exercise 1 and properties of variance to show that

var(Xn) = np(1 - p)

Mathematical Exercise 16. Sketch the graph of the variance as a function of p. Note in particular that the variance is largest when p = 1/2 and smallest when p = 0 or p = 1.

Mathematical Exercise 17. Compute the variance using the density function.

Mathematical Exercise 18. Show that the probability generating function is given by

E(tXn) = (1 - p + pt)n for t in R

Mathematical Exercise 19. Use the probability generating function in Exercise 18 to compute the mean and variance.

Mathematical Exercise 20. Use the identity jC(n, j) = nC(n - 1, j - 1) for n, j = 1, 2, ... to show that

E(Xnk) = npE[(Xn - 1 + 1)k - 1] for n, k = 1, 2, ...

Mathematical Exercise 21. Use the recursion result in Exercise 20 to give yet one more derivation of the mean and variance.

Simulation Exercise 22. In the binomial coin experiment, vary n and p with the scrollbars and note the location and size of the mean/standard deviation bar. Now with p = 0.7, run the simulation 1000 times, updating every 10 runs. Note the apparent convergence of the sample mean and standard deviation to the distribution mean and standard deviation.

Mathematical Exercise 23. A certain type of missile has failure probability 0.02. Compute the mean and standard deviation of the number of failures in 50 launches.

Mathematical Exercise 24. A fair die is rolled 1000 times. Give the mean and standard deviation of the number of aces.

The Galton Board

The Galton board is a triangular array of pegs. The rows are numbered 0, 1, ... from top downward. Row n has n + 1 pegs numbered from 0 at the left to n at the right. Thus a peg can be uniquely identified by the ordered pair (n, k) where n is the row number and k is the peg number in that row. The Galton board is named after Francis Galton.

Now suppose that a ball is dropped from above the top peg (0, 0). Each time the ball hits a peg, it bounces to the to the right with probability p and to the left with probability 1 - p, independently from bounce to bounce.

Mathematical Exercise 25. Show that the number of the peg that the ball hits in row n is the has the binomial distribution with parameters n and p.

Simulation Exercise 26. In the Galton board experiment, set n = 10 and p = 0.1. Click step several times and watch the balls fall through the pegs. Repeat for p = 0.3, 0.5, 0.7, and 0.9.

Simulation Exercise 27. In the Galton board experiment, set n = 15 and p = 0.1. Run the simulation 100 times, updating after each run. Note the general shape of the paths through the board. Repeat for p = 0.3, 0.5, 0.7, and 0.9.

Sums of Independent Binomial Variables

Next we will establish an important invariance property of the binomial distribution.

Mathematical Exercise 28. Use the representation in terms of indicator variables to show that if m and n are positive integers then

  1. Xm+n - Xm has the same distribution as Xn (binomial with parameters n and p).
  2. Xm+n - Xm and Xm are independent.

Thus, the random process Xn, n = 1, 2, ... has stationary, independent increments.

Mathematical Exercise 29. Show that if U and V are independent random variables for an experiment, and that U has the binomial distribution with parameters m and p, and V has the binomial distribution with parameters n and p, then U + V has the binomial distribution with parameters m + n and p.

  1. Give a probabilistic proof, using Exercise 28.
  2. Give an analytic proof using density functions.
  3. Give an analytic poof using probability generating functions.

Connection to the Hypergeometric Distribution

Mathematical Exercise 30. Suppose that m < n. Show that

P(Xm = j | Xn = k) = C(m, j) C(n - m, k - j) / C(n, k) for j = 0, 1, ..., m.

Interestingly, the distribution in Exercise 30 is independent of p. It is known as the hypergeometric distribution with parameters n, m and k. Try to interpret this result probabilistically.

Mathematical Exercise 31. A coin is tossed 100 times and results in 30 heads. Find the density function of the number of heads in the first 20 tosses.

The Normal Approximation

Simulation Exercise 32. In the binomial timeline experiment, set p = 0.1. Start with n = 1 and successively increase n by 1. Note the shape of the density function each time. With n = 100, run the experiment 1000 time, updating by 10. Repeat for p = 0.3, 0.5, 0.7, and 0.9.

The characteristic bell shape that you should observe in Exercise 32 is an example of the central limit theorem, because the binomial variable can be written as a sum of n independent, identically distributed random variables (the indicator variables).

Mathematical Exercise 33. Show that the distribution of the standardized variable given below converges to the standard normal distribution as n increases

(Xn - np) / [np(1 - p)]1/2.

This version of the central limit theorem is known as the DeMoivre-Laplace theorem, and is named after Abraham DeMoivre and Simeon Laplace. From a practical point of view, Exercise 33 means that, for large n, the distribution of Xn is approximately normal, with mean np and variance np(1 - p). Just how large n needs to be for the normal approximation to work well depends on the value of p. The rule of thumb is that we need np and n(1 - p) to be at least 5.

Simulation Exercise 34. In the binomial timeline experiment, set p = 0.5 and n = 15. Run the experiment 1000 times with an update frequency of 100. Compute and compare the following:

  1. P(5 X15 10)
  2. The relative frequency of the event {5 X15 10}
  3. The normal approximation to P(5 X15 10)

Simulation Exercise 35. In the binomial timeline experiment, set p = 0.3 and n = 20. Run the experiment 1000 times with an update frequency of 100. Compute and compare the following:

  1. P(5 X20 10)
  2. The relative frequency of the event {5 X20 10}
  3. The normal approximation to P(5 X20 10)

Simulation Exercise 36. In the binomial timeline experiment, set p = 0.8 and n = 30. Run the experiment 1000 times with an update frequency of 100. Compute and compare the following:

  1. P(22 X30 27)
  2. The relative frequency of the event {22 X30 27}
  3. The normal approximation to P(22 X30 27)

Mathematical Exercise 37. Suppose that in a certain district, 40% of the registered voters prefer candidate A. A random sample of 50 registered voters is selected.

  1. Give the mean and variance of the number in the sample who prefer A.
  2. Find the probability that fewer than 19 voters in the sample prefer A.
  3. Compute the normal approximation to the probability in (b).

Reliability

The binomial distribution arises frequently in the context of reliability. Suppose that a system consists of n components which operate independently. Each component is either good, with probability p, or defective, with probability 1 - p. Thus, the components are Bernoulli trials. Now suppose that the system as a whole functions properly if and only if at least k of the n components are good. In reliability terms, such a systems is called, appropriately enough, a k out of n system. The probability that the system functions properly is called the reliability of the system.

Mathematical Exercise 38. Comment on the reasonableness of the assumptions that the components behave like Bernoulli trials.

Mathematical Exercise 39. Show that the reliability of a k out of n system is Rn,k(p) = P(X >= k) where X has the binomial distribution with parameters n and p.

Mathematical Exercise 40. Show that Rn,n(p) = pn. An n out of n system is called a series system.

Mathematical Exercise 41. Show that Rn,1(p) = 1 - (1 - p)n. A 1 out of n system is called a parallel system.

Simulation Exercise 42. In the the binomial coin experiment, set n = 10 and p = 0.9 and run the simulation 1000 times, updating every 100 runs. Compute the empirical reliability and compare with the true reliability in each of the following cases:

  1. 10 out of 10 (series) system.
  2. 1 out of 10 (parallel) system.
  3. 4 out of 10 system.

Mathematical Exercise 43. Consider a system with n = 4 components. Sketch the graphs of R4,1, R4,2, R4,3, and R4,4 on the same set of axes.

Mathematical Exercise 44. An n out of 2n - 1 system is a majority rules system.

  1. Compute the reliability of a 2 out of 3 system.
  2. Compute the reliability of a 3 out of 5 system
  3. For what values of p is a 3 out of 5 system more reliable than a 2 out of 3 system?
  4. Sketch the graphs of R3,2 and R5,3 on the same set of axes.

Simulation Exercise 45. In the binomial coin experiment, compute the empirical reliability, based on 100 runs, in each of the following cases. Compare your results to the true probabilities.

  1. A 2 out of 3 system with p = 0.3
  2. A 3 out of 5 system with p = 0.3
  3. A 2 out of 3 system with p = 0.8
  4. A 3 out of 5 system with p = 0.8

Mathematical Exercise 46. Show that R2n - 1, n(1/2) = 1/2.