Virtual Laboratories > Bernoulli Trials > [1] 2 3 4 5 6 7
The Bernoulli trials process, named after James Bernoulli, is one of the simplest yet most important random processes in probability. Essentially, the process is the mathematical abstraction of coin tossing, but because of its wide applicability, it is usually stated in terms of a sequence of generic trials that satisfy the following assumptions:
Mathematically, we can describe the Bernoulli trials process with a sequence of indicator random variables:
I1, I2, I3, ...
An indicator variable is a random variable that takes only the values 1 and 0, which in this setting denote success and failure, respectively. The j'th indicator variable simply records the outcome of trial j. Thus, the indicator variables are independent and have the same density function:
P(Ij = 1) = p, P(Ij = 0) = (1 - p)
Thus, the Bernoulli trials process is characterized by a single parameter p.
As we noted earlier, the most obvious example of Bernoulli trials is coin tossing, where success means heads and failure means tails. The parameter p is the probability of heads (so in general, the coin is biased).
1. In the basic coin experiment, set n = 20 and p = 0.1. Run the
experiment with and observe the outcomes. Repeat with p = 0.3, 0.5, 0.7, 0.9.
2. Use the basic
assumptions to show that
P(I1 = i1, I2 = i2, ..., In = in) = pk(1 - p)n-k where k = i1 + i2 + ··· + in.
3. Suppose that I1,
I2, I3, ... is a Bernoulli trials process with
parameter p. Show that 1 - I1, 1 - I2, 1
- I3, ... is a Bernoulli trials sequence with parameter 1 - p.
In a sense, the most general example of Bernoulli trials occurs when an experiment is replicated. Specifically, suppose that we have a basic random experiment and an event of interest A. Suppose now that we create a compound experiment that consists of independent replications of the basic experiment. Define success on trial j to mean that event A occurred on the j'th run, and define failure on trial j to mean that event A did not occur on the j'th run. This clearly defines a Bernoulli trials process with parameter p = P(A).
Bernoulli trials are also formed when we sample from a dichotomous population. Specifically, suppose that we have a population of two types of objects, which we will refer to as type 0 and type 1. For example, the objects could be persons, classified as male or female, or the objects could be components, classified as good or defective. We select n objects at random from the population; by definition, this means that each object in the population at the time of the draw is equally likely to be chosen. If the sampling is with replacement, then each object drawn is replaced before the next draw. In this case, successive draws are independent, so the types of the objects in the sample form a sequence of Bernoulli trials, in which the parameter p is the proportion of type 1 objects in the population. If the sampling is without replacement, then the the successive draws are dependent, so the types of the objects in the sample do not form a sequence of Bernoulli trials. However, if the population size is large compared to the sample size, the dependence caused by not replacing the objects may be negligible, so that for all practical purposes, the types of the objects in the sample can be treated as a sequence of Bernoulli trials. Additional discussion of sampling from a dichotomous population is in the in the chapter Finite Sampling Models.
For future reference, let us compute the mean, variance, and probability generating function of a generic indicator variable I with P(I = 1) = p.
4. Show that E(I)
= p
5. Show that var(I)
= p(1 - p)
6. Show
that E(tI) = 1 - p + pt for t
in R.
7. Sketch the
graph of the variance in Exercise 5 as a function of p. Note in particular that the
variance is largest when p = 1/2 and smallest when p = 0 or p = 1.
8. Suppose
that a student takes a multiple choice test. The test has 10 questions, each of which has
4 possible answers (only one correct). If the student blindly guesses the answer to each
question, do the questions form a sequence of Bernoulli trials? If so, identify the trial
outcomes and the parameter p.
9. Candidate A is
running for office in a certain district. Twenty persons are selected at random from the
population of registered voters and asked if they prefer candidate A. Do the responses
form a sequence of Bernoulli trials? If so identify the trial outcomes and the meaning of
the parameter p.
10. An American
roulette wheel has 38 slots; 18 are red, 18 are black, and 2 are green. A gambler plays
roulette 15 times, betting on red each time. Do the outcomes form a sequence of Bernoulli
trials? If so, identify the trial outcomes and the parameter p.
11. Two tennis
players play a set of 6 games. Do the games form a sequence of Bernoulli trials? If so,
identify the trial outcomes and the meaning of the parameter p.
Suppose that each person in a population, independently of all others, has a certain disease with probability p. The disease can be identified by a blood test, but of course the test has a cost.
For a group of k > 1 persons, we will compare two strategies. The first is to test the k persons individually, so that of course, k tests are required. The second strategy is to pool the blood samples of the k persons and test the pooled sample first. We assume that the test is negative if and only if all k persons are free of the disease; in this case just one test is required. On the other hand, the test is positive if and only if at least one person has the disease, in which case we then have to test the persons individually; in this case k + 1 tests are required. Thus, let X denote the number of tests required for the pooled strategy.
12. Show
that
13. Show that, in
terms of expected value, the pooled strategy is better than the basic strategy
if and only if
p < 1 - (1 / k)1 / k.
The graph of the critical value pk = 1 - (1 / k)1 / k as a function of k in the interval [2, 20] is shown in the graph below:

14.
Show that
From Exercises 13 and 14, it follows that if p > 0.307, pooling never makes sense, regardless of the size of the group k. At the other extreme, if p is very small, so that the disease is quite rare, pooling is better unless the group size k is very large.
Now suppose that we have n persons. For any k that divides n, we can partition the population into n / k groups of k each, and pool the blood samples in each group. Note that k = 1 corresponds to individual testing. Let Xi denote the number of tests required for group i.
15. Argue that if k
> 1, X1, X2, ..., Xn/k
are independent and each has the distribution given in Exercise 12.
The total number of tests required for this partitioning scheme is
Yk = X1 + X2 + ··· + Xn/k.
16. Show that the
expected total number of tests is
Thus, in terms of expected value, the optimal strategy is to group the population into n / k groups of size k, where k minimizes the function defined in the previous exercise. It is difficult to get a closed-form expression for the optimal value of k, but this value can be determined numerically for specific n and p.
17.
For the following values of n and p, find the optimal pooling size
k and the expected number of tests.
Virtual Laboratories >
Bernoulli
Trials > [1] 2 3
4 5 6
7
Contents | Applets
| Data Sets
| Biographies
| Resources
| Keywords | ©