Virtual Laboratories > Bernoulli Trials > 1 2 3 4 [5] 6 7

5. The Negative Binomial Distribution


Suppose again that our random experiment is to perform Bernoulli trials I1, I2, ... In this section we will study the random variable Yk that gives the trial number of the k'th success. Note that Y1 is the number of trials needed to get the first success, which we now know has the geometric distribution. Recall also that Xn, the number of successes in the first n trials, has the binomial distribution with parameters n and p.

The Density Function

Mathematical Exercise 1. Show that Yk = n if and only if In = 1 and Xn-1 = k - 1.

Mathematical Exercise 2. Use Exercise 1, independence, and the binomial distribution to show that

P(Yk = n) = C(n - 1, k - 1)pk(1 - p)n - k for n = k, k + 1, k + 2, ...

The distribution defined by the density function in Exercise 2 is known as the negative binomial distribution; it has two parameters, the number of successes k and the success probability p.

Simulation Exercise 3. In the negative binomial experiment, vary k and p with the scroll bars and note the shape of the density function. Now set k = 2 and p = 0.4 and run the experiment with an update frequency of 10. Watch the apparent convergence of the relative frequency function to the density function.

Mathematical Exercise 4. Show that the binomial and negative binomial sequences are inverse to each other in the sense that

Xn k if and only if Yk n

Thus, argue that any event that can be expressed in terms of the negative binomial variables can also be expressed in terms of the binomial variables.

Mathematical Exercise 5. Show that

P(Yk = n) > P(Yk = n - 1) if and only if n < (k - 1 + p) / p.

Thus, the density function at first increases and then decreases, reaching its largest value at the greatest integer in (k - 1 + p) / p. This integer is the mode of the distribution and hence the negative binomial distribution is unimodal.

Mathematical Exercise 6. A fair die is rolled until 3 aces occur. Find the probability that at least 15 rolls are required.

Sums of Independent Geometric Variables

We will define random variables that give the number of trials between successive successes:

Z1 = Y1 and Zk = Yk - Yk-1 for k = 2, 3, ...

Mathematical Exercise 7. Argue that these variables are independent and each has the geometric distribution with parameter p. Moreover,

Yk = Z1 + Z2 + ··· + Zk.

The mean, variance and probability generating function of Yk now follow easily from results for the geometric distribution.

Mathematical Exercise 8. Show that E(Yk) = k / p.

Mathematical Exercise 9. Show that var(Yk) = k(1 - p) / p2.

Mathematical Exercise 10. Show that E(tYk) = [pt / (1 - t + tp)]k for |t| < 1 / (1 - p).

Mathematical Exercise 11. Suppose that U and V are independent random variables for an experiment, and that U has the negative binomial distribution with parameters j and p and V has the negative binomial distribution with parameters k and p. Show that U + V has the negative binomial distribution with parameters j + k and p.

  1. Give a probabilistic proof, based on Bernoulli trials.
  2. Give a proof based on moment generating functions.

Simulation Exercise 12. In the negative binomial experiment, vary k and p with the scroll bars and note the location and size of the mean/standard deviation bar. Now set k = 3 and p = 0.25 and run the experiment with an update frequency of 10. Watch the apparent convergence of the sample mean and standard deviation to the distribution mean and standard deviation.

Mathematical Exercise 13. A certain type of missile has failure probability 0.02. Find the mean and standard deviation of the launch number of the fourth failure.

Normal Approximation

Simulation Exercise 14. In the negative binomial experiment, start with p = 0.5 and k = 1. Successively increase k by 1, noting the shape of the density function each time. Repeat with p = 0.3 and p = 0.8.

Even though you are limited to k = 5, you can still see the characteristic bell shape. This is a consequence of the central limit theorem because the negative binomial variable can be written as a sum of k independent, identically distributed (geometric) random variables.

Mathematical Exercise 15. Show that the distribution of the standardized variable below converges to the standard normal distribution as k increases.

(Yk - k / p) / [k(1 - p) / p]1/2 = (pYk - k) / [k(1 - p)]1/2.

Simulation Exercise 16. In the negative binomial experiment, set p = 0.5 and k = 5. Run the experiment 1000 times with an update frequency of 100. Compute and compare each of the following:

  1. P(8 Y5 15)
  2. The relative frequency of the event {8 Y5 15}.
  3. The normal approximation to P(8 Y5 15).

Mathematical Exercise 17. A coin is tossed until the 50th head occurs.

  1. Assuming that the coin is fair, find the normal approximation of the probability that the coin is tossed at least 125 times.
  2. Suppose that you perform this experiment, and 125 tosses are required. Do you believe that the coin is fair?

The Banach Match Problem

Suppose that an absent-minded professor (is there any other kind?) has N matches in his right pocket and N matches in his left pocket. When he needs a match to light his pipe, he is equally likely to choose a match from either pocket. We want to compute the density function of the random variable W that gives the number of matches remaining when the professor first discovers that one of the pockets is empty. This is known as the Banach match problem, named for the mathematician Stefan Banach, who evidently behaved in the manner described.

We can recast the problem in terms of the negative binomial distribution. Clearly the match choices form a sequence of Bernoulli trials with parameter p = 1/2. Specifically, we can consider a match from the right pocket as a win for player R, and a match from the left pocket as a win for player L. In a hypothetical infinite sequence of trials, let Y denote the number of trials necessary for R to win N + 1 trials, and Z the number of trials necessary for L to win N + 1 trials. Note that Y and Z each have the negative binomial distribution with parameters N + 1 and p.

Mathematical Exercise 18. For k = 0, 1, ..., N, show that

  1. L wins N - k games at the moment when R wins N + 1 games if and only if Y = 2N - k + 1
  2. {Y = 2N - k + 1} is equivalent to the event that the professor first discovers that the right pocket is empty and that the left pocket has k matches
  3. P(Y = 2N - k + 1) = C(2N - k, N)(1/2)2N - k + 1.

Mathematical Exercise 19. For k = 0, 1, ..., N, show that

  1. R wins N - k games at the moment when L wins N + 1 games if and only if Z = 2N - k + 1
  2. {Z = 2N - k + 1} is equivalent to the event that the professor first discovers that the left pocket is empty and that the right pocket has k matches
  3. P(Z = 2N - k + 1) = C(2N - k, N)(1/2)2N - k + 1.

Mathematical Exercise 20. Combine the results of the previous two exercises to conclude that

P(W = k) = C(2N - k, N) (1/2)2N - k for k = 0, 1, ..., N.

We can also solve the non-symmetric Banach match problem, using the same methods as above. Thus, suppose that the professor reaches for a match in his right pocket with probability p and in his left pocket with probability 1 - p, where 0 < p < 1..The essential change in the analysis is that Y has the negative binomial distribution with parameters N + 1 and p, while Z has the negative binomial distribution with parameters N + 1 and 1 - p.

Mathematical Exercise 21. Show that

P(W = k) = C(2N - k, N)[pN + 1 (1 - p)N - k + (1 - p)N pN - k] for k = 0, 1, ..., N.

The Problem of Points

Suppose that two teams (or individuals) A and B play a sequence of Bernoulli trials, where p is the probability that player A wins a trial. For nonnegative integers n and m, let Fn,m(p) denote the probability that A wins n points before B wins m points. Computing Fn,m(p) is an historically famous problem, known as the problem of points, that was solved by Pierre de Fermat and by Blaise Pascal.

Mathematical Exercise 22. Comment on the validity of the Bernoulli trial assumptions (independence of trials and constant probability of success) for games of sport that have a skill component as well as a random component.

There is an easy solution to the problem of points using the binomial distribution; this was essentially Pascal's solution. Let us pretend that n + m - 1 trials are played, regardless of the outcomes, and let Xn + m - 1 denote the number of trials that A wins. By definition, Xn + m - 1 has the binomial distribution with parameters n + m - 1 and p.

Mathematical Exercise 23. Show that A wins n trials before B wins m trials if and only if

Xn + m - 1 gteq.gif (844 bytes) n.

Mathematical Exercise 24. Use the result of the previous exercise to show that

Fn,m(p) = sumk = n, ..., n + m -1 C(n + m - 1, k) pk(1 - p)n + m - 1 - k.

Simulation Exercise 25. In the problem of points experiment, vary the parameters n, m, and p, and note how the probability changes. Now with n = 10, m = 5, and p = 0.5, run the problem 1000 times with an update frequency of 10. Note the apparent convergence of the relative frequency to the probability.

There is also an easy solution to the problem of points using the negative binomial distribution In a sense, this has to be the case, given the equivalence between the binomial and negative binomial processes. First, let us pretend that the games go on forever, regardless of the outcomes, and let Yn denote the number of games needed for A to win n games. By definition, Yn has the negative binomial distribution with parameters n and p.

Mathematical Exercise 26. Show that A wins n trials before B wins m trials if and only if

Yn n + m -1

Mathematical Exercise 27. Use the result of the previous exercise to show that

Fn,m(p) = sumj = n, ..., n + m - 1 C(j - 1, n - 1) pn(1 - p)j - n.

Simulation Exercise 28. In the problem of points experiments, vary the parameters j, k, and p, and note how the probability changes. Now with n = 10, m = 10, and p = 0.7, run the problem 1000 times with an update frequency of 10. Note the apparent convergence of the relative frequency to the probability.

Mathematical Exercise 29. Show that for fixed n and m, Fn,m(p) increases from 0 to 1 as p increases from 0 to 1.

Simulation Exercise 30. In the problem of points experiment, vary the parameters n, m, and p, and note how the probability changes. Now with n = 5, m = 10, and p = 0.3, run the problem 1000 times with an update frequency of 10. Note the apparent convergence of the relative frequency to the probability.

Mathematical Exercise 31. Show that Fn,m(p) decreases as n increases for fixed m and p, and that Fn,m(p) increases as m increases for fixed n and p.

Simulation Exercise 32. In the problem of points experiments, vary the parameters n, m, and p, and note how the probability changes. Now with n = 10, m = 15, and p = 0.3, run the problem 1000 times with an update frequency of 10. Note the apparent convergence of the relative frequency to the probability.

Mathematical Exercise 33. Condition on the outcome of the first trial to derive the following recurrence relation and boundary conditions (this was essentially Fermat's solution):

  1. Fn,m(p) = pFn - 1,m(p) + (1 - p)Fn,m - 1(p), for n, m = 1, 2, ...
  2. Fn,0(p) = 0, F0,m(p) = 1.

Series of Games

The special case n = m is important, because Fn,n(p) is the probability that A wins a best n of 2n - 1 series. Such series, especially when n = 2, 3, or 4, are frequently used in championship tournaments.

Mathematical Exercise 34. Suppose that p = 0.6. Explicitly find the probability that team A wins

  1. A best 3 of 5 game series (n = 3).
  2. A best 4 of 7 game series (n = 4).

Simulation Exercise 35. In the problem of points experiments, vary the parameters n, m and p (keeping n = m), and note how the probability changes. Now simulate a best 3 of 5 series by selecting n = m = 3, p = 0.6. Run the experiment 1000 times, updating every 10 runs. Note the apparent convergence of the relative frequency to the true probability.

Mathematical Exercise 36. Show that Fn,n(1 - p) = 1 - Fn,n(p) for any n and p.

  1. Try to give both a probabilistic argument and an analytic argument.
  2. Show that this condition means that the graph of Fn,n is symmetric with respect to p = 1/2.
  3. Show that this condition implies that Fn,n(1/2) = 1/2.

Simulation Exercise 37. In the problem of points experiments, vary the parameters n, m and p (keeping n = m), and note how the probability changes. Now simulate a best 4 of 7 series by selecting n = m = 4, p = 0.45. Run the experiment 1000 times, updating every 10 runs. Note the apparent convergence of the relative frequency to the true probability.

Mathematical Exercise 38. Suppose that if n > m. Show that Fn,n(p) > Fm,m(p) if and only if and p > 1/2. Interpret the result.

Division of Stakes

The problem of points originated from a question posed by Chevalier de Mere, who was interested in the fair division of stakes when a game is interrupted. Specifically, suppose that players A and B each put up C monetary units, and then play Bernoulli trials until one of them wins a specified number of trials. The winner then takes the entire 2C fortune.

Mathematical Exercise 39. If the game is interrupted when A needs to win n more trials and B needs to win m more trials, argue that the fortune should be divided between A and B, respectively, as follows:

  1. 2C Fn,m(p) for A,
  2. 2C[1 - Fn,m(p)] for B.

Mathematical Exercise 40. Suppose that players A and B bet $50 each. The players toss a fair coin until one of them has 10 wins; the winner takes the entire fortune. Suppose that the game is interrupted by the gambling police when A has 5 wins and B has 3 wins. How should the stakes be divided?