Virtual Laboratories > Bernoulli Trials > 1 [2] 3 4 5 6 7
Suppose that our random experiment is to perform Bernoulli trials I1, I2, .... In this section we will study the random variable Xn that gives the number of successes in the first n trials. This variable has a simple expression in terms of the indicator variables:
1. Show that Xn
= I1 + I2 + ··· + In.
2.
Suppose
that K
N = {1, 2, ..., n}
and #(K) = k. Use the assumptions of Bernoulli trials to show that
P(Ij = 1 for j
K and Ij = 0 for j
N - K) = pk(1 - p)n
-k.
Recall that the number of subsets of size k from a set of size n is the binomial coefficient
C(n, k). = n!/[k!(n - k)!}
3.
Use
Exercise 2 and basic properties of probability to show that
P(Xn = k) = C(n, k)pk(1 - p)n-k for k = 0, 1, ..., n.
The distribution with this density function is known as the binomial distribution with parameters n and p. The binomial family of distributions is one of the most important in probability.
4. In the
binomial coin experiment, vary n and p with the scrollbars, and note the
shape and location of the density function. Now with n = 10 and p
= 0.7, run the simulation 1000 times, updating every 10 runs. Note the apparent
convergence of the relative frequency function to the density function.
5. A fair
die is tossed 5 times. Give explicitly the density function of the number of aces.
6. A student
takes a multiple choice test with 10 questions, each with 4 choices. If the student
blindly guesses, find the probability that he will get at least 5 questions correct.
7.
Use the
binomial theorem to show that the binomial density function really is a (discrete) density
function.
8. Show that
Thus, the density function at first increases and then decreases, reaching its largest
value at
9. Suppose that U
is a random variable having the binomial distribution with parameters n and p.
Show that n - U has the binomial distribution with parameters n
and 1 - p.
In 1693, Samuel Pepys asked Isaac Newton whether it is more likely to get at least one ace in 6 rolls of a die or at least two aces in 12 rolls of a die. This problems is known a Pepys' problem; naturally, Pepys had fair dice in mind.
10. Guess the
answer to Pepys' problem based on empirical data. With fair dice and n = 6, run the
simulation
of the dice experiment 500 times and compute the relative frequency of at least one ace. Now with
n = 12, run the simulation 500 times and compute the relative frequency of at least
two aces. Compare the results.
11. Solve Pepys'
problem using the binomial distribution.
12. Which is more
likely: at least one ace with 4 throws of a fair die or at least one double ace in 24
throws of two fair dice? This is known as DeMere's problem, named after
Chevalier De Mere
We will compute the mean and variance of the binomial distribution several different ways. The method using indicator variables is the best.
13. Use Exercise 1
and basic properties of expected value to show that
E(Xn) = np.
This makes intuitive sense, since p should be approximately the proportion of successes in a large number of trials.
14.
Compute
the mean using the density function.
15.
Use
Exercise 1 and properties of variance to show that
var(Xn) = np(1 - p)
16. Sketch the
graph of the variance as a function of p. Note in particular that the variance is
largest when p = 1/2 and smallest when p = 0 or p = 1.
17. Compute the
variance using the density function.
18. Show that the
probability generating function is given by
E(tXn) = (1 - p + pt)n for t in R
19. Use the
probability generating function in Exercise 18 to compute the mean and variance.
20. Use the
identity jC(n, j) = nC(n - 1, j - 1) for
n, j = 1, 2, ... to show that
E(Xnk) = npE[(Xn - 1 + 1)k - 1] for n, k = 1, 2, ...
21. Use the
recursion result in Exercise 20 to give yet one more derivation of the mean and variance.
22. In the
binomial coin experiment, vary n and p with the scrollbars and note the
location and size of the mean/standard deviation bar. Now with p = 0.7, run the
simulation 1000 times, updating every 10 runs. Note the apparent convergence of the sample
mean and standard deviation to the distribution mean and standard deviation.
23. A certain type
of missile has failure probability 0.02. Compute the mean and standard deviation of the
number of failures in 50 launches.
24. A fair die is
rolled 1000 times. Give the mean and standard deviation of the number of aces.
The Galton board is a triangular array of pegs. The rows are numbered 0, 1, ... from top downward. Row n has n + 1 pegs numbered from 0 at the left to n at the right. Thus a peg can be uniquely identified by the ordered pair (n, k) where n is the row number and k is the peg number in that row. The Galton board is named after Francis Galton.
Now suppose that a ball is dropped from above the top peg (0, 0). Each time the ball hits a peg, it bounces to the to the right with probability p and to the left with probability 1 - p, independently from bounce to bounce.
25. Show that the
number of the peg that the ball hits in row n is the has the binomial distribution
with parameters n and p.
26. In the
Galton board experiment, set n = 10 and p = 0.1. Click
step several times and watch the balls fall through the pegs. Repeat for p = 0.3,
0.5, 0.7, and 0.9.
27. In the
Galton board experiment, set n = 15 and p = 0.1. Run
the simulation 100 times, updating after each run. Note the general shape of the paths
through the board. Repeat for p = 0.3, 0.5, 0.7, and 0.9.
Next we will establish an important invariance property of the binomial distribution.
28. Use the
representation in terms of indicator variables to show that if m and n are
positive integers then
Thus, the random process Xn, n = 1, 2, ... has stationary, independent increments.
29. Show
that if U and V are independent random variables for an experiment,
and that U has the binomial distribution with parameters m and p, and
V has the binomial distribution with parameters n and p, then U +
V has the binomial distribution with parameters m + n and p.
30. Suppose that m
< n. Show that
P(Xm = j | Xn = k) = C(m, j) C(n - m, k - j) / C(n, k) for j = 0, 1, ..., m.
Interestingly, the distribution in Exercise 30 is independent of p. It is known as the hypergeometric distribution with parameters n, m and k. Try to interpret this result probabilistically.
31. A coin is
tossed 100 times and results in 30 heads. Find the density function of the number of heads
in the first 20 tosses.
32.
In
the binomial timeline
experiment, set p = 0.1. Start with n = 1 and
successively increase n by 1. Note the shape of the density function each time.
With n = 100, run the experiment 1000 time, updating by 10. Repeat for p
= 0.3, 0.5, 0.7, and 0.9.
The characteristic bell shape that you should observe in Exercise 32 is an example of the central limit theorem, because the binomial variable can be written as a sum of n independent, identically distributed random variables (the indicator variables).
33. Show that the
distribution of the standardized variable given below converges to the standard normal distribution as n increases
(Xn - np) / [np(1 - p)]1/2.
This version of the central limit theorem is known as the DeMoivre-Laplace theorem, and is named after Abraham DeMoivre and Simeon Laplace. From a practical point of view, Exercise 33 means that, for large n, the distribution of Xn is approximately normal, with mean np and variance np(1 - p). Just how large n needs to be for the normal approximation to work well depends on the value of p. The rule of thumb is that we need np and n(1 - p) to be at least 5.
34. In the
binomial timeline
experiment, set p = 0.5 and n = 15. Run the experiment
1000 times with an update frequency of 100. Compute and compare the following:
35. In the
binomial timeline
experiment, set p = 0.3 and n = 20. Run the experiment
1000 times with an update frequency of 100. Compute and compare the following:
36. In the
binomial timeline
experiment, set p = 0.8 and n = 30. Run the experiment
1000 times with an update frequency of 100. Compute and compare the following:
37. Suppose
that in a certain district, 40% of the registered voters prefer candidate A. A
random sample of 50 registered voters is selected.
The binomial distribution arises frequently in the context of reliability. Suppose that a system consists of n components which operate independently. Each component is either good, with probability p, or defective, with probability 1 - p. Thus, the components are Bernoulli trials. Now suppose that the system as a whole functions properly if and only if at least k of the n components are good. In reliability terms, such a systems is called, appropriately enough, a k out of n system. The probability that the system functions properly is called the reliability of the system.
38. Comment on the
reasonableness of the assumptions that the components behave like Bernoulli trials.
39.
Show
that the reliability of a k out of n system is Rn,k(p)
= P(X
k) where X
has the binomial distribution with parameters n and p.
40.
Show
that Rn,n(p) = pn. An n
out of n system is called a series system.
41.
Show
that Rn,1(p) = 1 - (1 - p)n. A 1
out of n system is called a parallel system.
42.
In
the the binomial coin
experiment, set n = 10 and p = 0.9 and run the
simulation 1000 times, updating every 100 runs. Compute the empirical reliability and
compare with the true reliability in each of the following cases:
43. Consider a
system with n = 4 components. Sketch the graphs of R4,1, R4,2,
R4,3, and R4,4 on the same set of axes.
44. An n
out of 2n - 1 system is a majority rules system.
45. In the
binomial coin
experiment, compute the empirical reliability, based on 100 runs, in each of
the following cases. Compare your results to the true probabilities.
46. Show that R2n
- 1, n(1/2) = 1/2.