Virtual Laboratories > Bernoulli Trials > [1] 2 3 4 5 6 7

Introduction


Bernoulli Trials

The Bernoulli trials process, named after James Bernoulli, is one of the simplest yet most important random processes in probability. Essentially, the process is the mathematical abstraction of coin tossing, but because of its wide applicability, it is usually stated in terms of a sequence of generic trials that satisfy the following assumptions:

  1. Each trial has two possible outcomes, generically called success and failure.
  2. The trials are independent. Intuitively, the outcome of one trial has no influence over the outcome of another trial.
  3. On each trial, the probability of success is p and the probability of failure is 1 - p.

Mathematically, we can describe the Bernoulli trials process with a sequence of indicator random variables:

I1, I2, I3, ...

An indicator variable is a random variable that takes only the values 1 and 0, which in this setting denote success and failure, respectively. The j'th indicator variable simply records the outcome of trial j. Thus, the indicator variables are independent and have the same density function:

P(Ij = 1) = p, P(Ij = 0) = (1 - p)

Thus, the Bernoulli trials process is characterized by a single parameter p.

As we noted earlier, the most obvious example of Bernoulli trials is coin tossing, where success means heads and failure means tails. The parameter p is the probability of heads (so in general, the coin is biased).

Simulation Exercise 1. In the basic coin experiment, set n = 20 and p = 0.1. Run the experiment with and observe the outcomes. Repeat with p = 0.3, 0.5, 0.7, 0.9.

Mathematical Exercise 2. Use the basic assumptions to show that

P(I1 = i1, I2 = i2, ..., In = in) = pk(1 - p)n-k where k = i1 + i2 + ··· + in.

Mathematical Exercise 3. Suppose that I1, I2, I3, ... is a Bernoulli trials process with parameter p. Show that 1 - I1, 1 - I2, 1 - I3, ... is a Bernoulli trials sequence with parameter 1 - p.

Generic Examples

In a sense, the most general example of Bernoulli trials occurs when an experiment is replicated. Specifically, suppose that we have a basic random experiment and an event of interest A. Suppose now that we create a compound experiment that consists of independent replications of the basic experiment. Define success on trial j to mean that event A occurred on the j'th run, and define failure on trial j to mean that event A did not occur on the j'th run. This clearly defines a Bernoulli trials process with parameter p = P(A).

Bernoulli trials are also formed when we sample from a dichotomous population. Specifically, suppose that we have a population of two types of objects, which we will refer to as type 0 and type 1. For example, the objects could be persons, classified as male or female, or the objects could be components, classified as good or defective. We select n objects at random from the population; by definition, this means that each object in the population at the time of the draw is equally likely to be chosen. If the sampling is with replacement, then each object drawn is replaced before the next draw. In this case, successive draws are independent, so the types of the objects in the sample form a sequence of Bernoulli trials, in which the parameter p is the proportion of type 1 objects in the population. If the sampling is without replacement, then the the successive draws are dependent, so the types of the objects in the sample do not form a sequence of Bernoulli trials. However, if the population size is large compared to the sample size, the dependence caused by not replacing the objects may be negligible, so that for all practical purposes, the types of the objects in the sample can be treated as a sequence of Bernoulli trials. Additional discussion of sampling from a dichotomous population is in the in the chapter Finite Sampling Models.

Moments

For future reference, let us compute the mean, variance, and probability generating function of a generic indicator variable I with P(I = 1) = p.

Mathematical Exercise 4. Show that E(I) = p

Mathematical Exercise 5. Show that var(I) = p(1 - p)

Mathematical Exercise 6. Show that E(tI) = 1 - p + pt for t in R.

Mathematical Exercise 7. Sketch the graph of the variance in Exercise 5 as a function of p. Note in particular that the variance is largest when p = 1/2 and smallest when p = 0 or p = 1.

Exercises

Mathematical Exercise 8. Suppose that a student takes a multiple choice test. The test has 10 questions, each of which has 4 possible answers (only one correct). If the student blindly guesses the answer to each question, do the questions form a sequence of Bernoulli trials? If so, identify the trial outcomes and the parameter p.

Mathematical Exercise 9. Candidate A is running for office in a certain district. Twenty persons are selected at random from the population of registered voters and asked if they prefer candidate A. Do the responses form a sequence of Bernoulli trials? If so identify the trial outcomes and the meaning of the parameter p.

Mathematical Exercise 10. An American roulette wheel has 38 slots; 18 are red, 18 are black, and 2 are green. A gambler plays roulette 15 times, betting on red each time. Do the outcomes form a sequence of Bernoulli trials? If so, identify the trial outcomes and the parameter p.

Mathematical Exercise 11. Two tennis players play a set of 6 games. Do the games form a sequence of Bernoulli trials? If so, identify the trial outcomes and the meaning of the parameter p.

The Pooled Blood Test

Suppose that each person in a population, independently of all others, has a certain disease with probability p. The disease can be identified by a blood test, but of course the test has a cost.

For a group of k > 1 persons, we will compare two strategies. The first is to test the k persons individually, so that of course, k tests are required. The second strategy is to pool the blood samples of the k persons and test the pooled sample first. We assume that the test is negative if and only if all k persons are free of the disease; in this case just one test is required. On the other hand, the test is positive if and only if at least one person has the disease, in which case we then have to test the persons individually; in this case k + 1 tests are required. Thus, let X denote the number of tests required for the pooled strategy.

Mathematical Exercise 12. Show that

  1. P(X = 1) = (1 - p)k, P(X = k + 1) = 1 - (1 - p)k.
  2. E(X) = (k + 1) - k (1 - p)k.

Mathematical Exercise 13. Show that, in terms of expected value, the pooled strategy is better than the basic strategy if and only if

p < 1 - (1 / k)1 / k.

The graph of the critical value pk = 1 - (1 / k)1 / k as a function of k in the interval [2, 20] is shown in the graph below:

The graph of pk

Mathematical Exercise 14. Show that

  1. The maximum value of pk occurs at k = 3, and p3 ~ 0.307.
  2. pk decreases to 0 as k increases from 3 to infinity.

From Exercises 13 and 14, it follows that if p > 0.307, pooling never makes sense, regardless of the size of the group k. At the other extreme, if p is very small, so that the disease is quite rare, pooling is better unless the group size k is very large.

Now suppose that we have n persons. For any k that divides n, we can partition the population into n / k groups of k each, and pool the blood samples in each group. Note that k = 1 corresponds to individual testing. Let Xi denote the number of tests required for group i.

Mathematical Exercise 15. Argue that if k > 1, X1, X2, ..., Xn/k are independent and each has the distribution given in Exercise 12.

The total number of tests required for this partitioning scheme is

Yk = X1 + X2 + ··· + Xn/k.

Mathematical Exercise 16. Show that the expected total number of tests is

  1. E(Yk) = n if k = 1
  2. E(Yk) = n[1 + 1 / k - (1 - p)k] if k > 1.

Thus, in terms of expected value, the optimal strategy is to group the population into n / k groups of size k, where k minimizes the function defined in the previous exercise. It is difficult to get a closed-form expression for the optimal value of k, but this value can be determined numerically for specific n and p.

Mathematical Exercise 17. For the following values of n and p, find the optimal pooling size k and the expected number of tests.

  1. n = 100, p = 0.01.
  2. n = 1000, p = 0.05
  3. n = 1000, p = 0.001

Virtual Laboratories > Bernoulli Trials > [1] 2 3 4 5 6 7
Contents | Applets | Data Sets | Biographies | Resources | Keywords | ©