Virtual Laboratories > Probability Spaces > 1 2 3 [4] 5 6 7 8

4. Probability Measure


Suppose that we have a random experiment with sample space S. The probability of an event is a measure of how likely the event is to occur when the experiment if run.

Axioms

Mathematically, a probability measure (or distribution) P for a random experiment is a real-valued function defined on the collection of of events that satisfies the following axioms:

  1. P(A) 0 for any event A.
  2. P(S) = 1
  3. P[unionj in J Aj] = sumj in J P(Aj) if {Aj: j in J} is a countable, pairwise disjoint collection of events.

Axiom 3 is known as countable additivity, and states that the probability of a union of a finite or countably infinite collection of disjoint events is the sum of the corresponding probabilities. The axioms are known as the Kolmogorov axioms, in honor of Andrey Kolmogorov.

Axioms 1 and 2 are really just a matter of convention; we choose to measure the probability of an event with a number between 0 and 1 (as opposed, say, to a number between -5 and 7). Axiom 3 however, is fundamental and inescapable. It is required for probability for precisely the same reason that it is required for other measures of the "size" of a set, such as 

On the other hand, uncountable additivity (the extension of axiom 3 to an uncountable index set J) is unreasonable for probability, just as it is for other measures. For example, an interval of positive length in R is a union of uncountably many points, each of which has length 0.

We now have defined the three essential ingredients that model a random experiment: 

  1. The sample space S
  2. the sigma algebra of events A,
  3. the probability measure P

Together these define a probability space (S, A, P).

The Law of Large Numbers

Intuitively, the probability of an event is supposed to measure the long-term relative frequency of the event. Specifically, suppose that we repeat the experiment indefinitely. (Note that this actually creates a new, compound experiment.) For an event A in the basic experiment, let Nn(A) denote the number of times A occurred (the frequency of A) in the first n runs. (Note that this is a random variable in the compound experiment). Thus,

Pn(A) = Nn(A) / n

is the relative frequency of A in the first n runs. If we have chosen the correct probability measure for the experiment, then in some sense we expect that the relative frequency of each event should converge to the probability of the event:

Pn(A) converges to P(A) as n converges to infinity.

The precise statement of this is the law of large numbers or law of averages, one of the fundamental theorems in probability. To emphasize the point, note that in general there will be lots of possible probability measures for an experiment, in the sense of the axioms. However, only the true probability measure will satisfy the law of large numbers.

It follows that if we have the data from n runs of the experiment, the observed relative frequency Pn(A) can be used as an approximation for P(A); this approximation is called the empirical probability of A.

Mathematical Exercise 1. Show that Pn satisfies the axioms of a probability measure (given the data from n runs of the experiment)

The Distribution of a Random Variable

Suppose that X is a random variable for the experiment, taking values in a set T.

Mathematical Exercise 2. Show that P(X in B) as a function of B subset T, defines a probability measure on T. Hint: Recall that the inverse image preserves all set operations.

The probability measure in the previous exercise is called the probability distribution of X. Thus, any random variable X for an experiment defines a new probability space:

  1. A set of outcomes T (the possible values of X).
  2. A collection of events (the subsets of T).
  3. A probability measure on these events (the probability distribution of X).

Moreover, recall that the outcome of the experiment itself can be thought of as a random variable. Specifically, if we take X to be the identity function on S, then X is a random variable and

P(X in A) = P(A).

Thus, any probability measure can be thought of as the distribution of a random variable.

Measures

How can we construct probability measures? As noted briefly above, there are other measures of the "size" of sets; in many cases, these can be converted into probability measures.

First, a (nonnegative) measure m on S is a mapping on the (measurable) subsets of S that satisfies axioms 1 and 3 above. In general, m(A) is allowed to be infinite for a subset A. However, if m(S) is positive and finite, then m can easily be rescaled into a probability measure.

Mathematical Exercise 3. Show that if m is a measure on S with m(S) finite and positive, then P defined below is a probability measure on S.

P(A) = m(A) / m(S) for A subset S.

In the context of Exercise 3, m(S) is called the normalizing constant. In the next two subsections, we consider some very important special cases.

Discrete Distributions

Suppose that S is a finite, nonempty set. Clearly, counting measure # is a finite measure on S:

#(A) = the number of elements in A for A subset S.

The corresponding probability measure is called the discrete uniform distribution on S, and is particularly important in combinatorial and sampling experiments:

P(A) = #(A) / #(S) for A subset S.

We can give a more general construction for countable sample spaces that can be used to define many probability measures.

Mathematical Exercise 4. Suppose that S is nonempty and countable, and that g is a nonnegative real-valued function defined on S. Show that m defined below is a measure on S:

m(A) = sumx in A g(x) for A subset S.

Thus, if m(S) is finite and positive, then P(A) = m(A) / m(S) defines a probability measure by Exercise 3. Distributions of this type are said to be discrete. Discrete distributions are studied in detail in the chapter on Distributions.

Mathematical Exercise 5. In the setting of previous exercise, show that if S is finite and g is a constant function, then the corresponding probability measure P is the discrete uniform distribution on S.

Continuous Distributions

We define n-dimensional measure on Rn (also called Lebesgue measure, in honor of Henri Lebesgue) by

mn(A) = integralA 1dx for A subset Rn.

Note that if n > 1, the integral above is a multiple integral; x = (x1, x2, ..., xn) and dx = dx1dx2...dxn. The countable additivity axiom holds because of an essential property of integrals, which we will assume. In particular, note from calculus that

  1. m1(A) is the length of A for A subset R.
  2. m2(A) is the area of A for A subset R2.
  3. m3(A) is the volume of A for A subset R3.

Now, if S is a subset of Rn with mn(S) positive and finite, then

P(A) = mn(A) / mn(S)

is a probability measure on S by Exercise 2, called the continuous uniform distribution on S.

We can generalize this construction to produce many other distributions. Suppose that g is a nonnegative real valued function defined on S. Define

m(A) = integralA g(x) dx for A subset S.

Then m is a measure on S. Thus if m(S) is finite and positive, then P(A) = m(A) / m(S) defines a probability measure as in Exercise 2. Distributions of this type are said to be continuous. Continuous distributions are studied in detail in the chapter on Distributions.

It is important to note again that, unlike many other areas of mathematics, the low-dimensional spaces (n = 1, 2, 3) do not play a special role, except for exposition. For example in the Cicada data, some of the variables recorded are body weight, body length, wing width, and wing length. A probability model for these variables would specify a distribution on a subset of R4.

Basic Rules of Probability

Suppose that we have a random experiment with sample space S and probability measure P. In the following exercises, A and B are events.

Mathematical Exercise 6. Show that P(Ac) = 1 - P(A).

Mathematical Exercise 7. Show that P(Ø) = 0.

Mathematical Exercise 8. Show that P(B intersect Ac) = P(B) - P(A intersect B).

Mathematical Exercise 9. Show that if A subset B then P(B intersect Ac) = P(B) - P(A).

Recall that B intersect Ac is sometimes written B - A when A subset B. With this notation, the result in the previous exercise has the attractive form

P(B - A) = P(B) - P(A).

Mathematical Exercise 10. Show that if A subset B then P(A) P(B).

Mathematical Exercise 11. Suppose that {Aj: j in J} is a countable collection of events. Prove Boole's inequality (named after George Boole):

P[unionj Aj] <= sumj P(Aj).

Hint: Let J = {1, 2, ...} and define B1 = A1, B2 = A2 intersect A1c, B3 = A3 intersect A1c intersect A2c, ... Show B1, B2, ... are pairwise disjoint and have the same union as A1, A2, .... Use the additivity axiom of probability and the result of Exercise 6.

Mathematical Exercise 12. Suppose that {Aj: j in J} is a countable collection of events with P(Aj) = 0 for each j in J. Use Boole's inequality to show that

P[unionj Aj] = 0.

Mathematical Exercise 13. Suppose that {Aj: j in J}is a countable collection of events. Prove Bonferroni's inequality (named after Carlo Bonferroni):

P[intersectj Aj] >= 1 - sumj [1 - P(Aj)].

Hint: Apply Boole's inequality to {Ajc: j in J}

Mathematical Exercise 14. Suppose that that {Aj: j in J}is a countable collection of events with P(Aj) = 1 for each j in J. Use Bonferroni's inequality to show that

P[intersectj Aj] = 1.

Mathematical Exercise 15. Suppose that A and B are events in an experiment with P(A) = 1. Show that P(Aintersect B) = P(B)

Mathematical Exercise 16. Prove the law of total probability: if {Aj: j in J} is a countable collection of events that partition the sample space S, then for any event B,

P(B) = sumj P(Aj intersect B).

The Inclusion-Exclusion Formula

The inclusion-exclusion formulas provide a method for computing the probability of a union of events in terms of the probabilities of various intersections of the events.

Mathematical Exercise 17. Show that if A and B are events then

P(A union B) = P(A) + P(B) - P(A intersect B).

Mathematical Exercise 18. Show that if A, B, and C are events then

P(A union B union C) = P(A) + P(B) + P(C) - P(A intersect B) - P(A intersect C) - P(B intersect C) + P(A intersect B intersect C)

The last two exercises can be generalized to a union of n events Ai, i = 1, 2, ...n. The generalization is known as the inclusion-exclusion formula. To simplify the formulation, let N denote the index set {1, 2, ..., n}. Define

  1. pJ = P[intersectj in J Aj] for J subset N.
  2. qk = sum{J: #(J) = k} pJ for k in N

Mathematical Exercise 19. Show that P[unioni = 1, ..., n Ai] = sumk = 1, ..., n (-1)k - 1 qk.

The general Bonferroni inequalities state that if sum on the right is truncated after k terms (k < n), then the truncated sum is an upper bound for the probability of the union if k is odd (so that the last term has a positive sign) and is a lower bound for the probability of the union if k is even (so that the last terms has a negative sign).

If you go back and look at your proofs of the basic properties in Exercises 6-19, you will see that they hold for any finite measure m, not just probability. The only change is that the number 1 is replaced by m(S). In particular, the inclusion-exclusion rule is as important in combinatorics (the study of counting measure) as it is in probability.

Computational Exercises

Mathematical Exercise 20. Suppose that we roll 2 fair dice and record the sequence of scores. Let A denote the event that the first die score is less than 3 and B the event that the sum of the dice scores is 6.

  1. Define the sample space S mathematically.
  2. Since the dice are fair, argue that the uniform distribution on S is appropriate.
  3. Find P(A).
  4. Find P(B).
  5. Find P(A intersect B).
  6. Find P(A union B).
  7. Find P(B intersect Ac ).

Simulation Exercise 21. In the dice experiment, set n = 2. Run the experiment 100 times and compute the empirical probability of each event in the previous exercise.

Mathematical Exercise 22. Consider the experiment in which 2 cards are dealt from a standard deck and the sequence of cards recorded. For i = 1, 2, let Hi denote the event that card i is a heart.

  1. Define the sample space S mathematically
  2. Argue that if the deck is well-shuffled, then the uniform distribution on S is appropriate.
  3. Find P(H1)
  4. Find P(H1 intersect H2)
  5. Find P(H1c intersect H2)
  6. Find P(H2)
  7. Find P(H1 union H2).

Simulation Exercise 23. In the card experiment, set n = 2. Run the experiment 100 times and compute the empirical probability of each event in the previous exercise

Mathematical Exercise 24. Recall that in Buffon's coin experiment, a coin with radius r <= 1/2 is tossed "randomly" on a floor with with square tiles of side length 1, and the coordinates of the center of the coin are recorded, relative to the center of the square in which the coin lands. Let A denote the event that the coin does not touch the sides of the square.

  1. Define the sample space S mathematically.
  2. Argue that the uniform distribution on S is appropriate.
  3. Find P(A).
  4. Find P(Ac).

Simulation Exercise 25. In Buffon's coin experiment, set r = 0.2. Run the experiment 100 times and compute the empirical probability of each event in the previous exercise.

Mathematical Exercise 26. Suppose that A and B are events in an experiment with P(A) = 1 / 3, P(B) = 1 / 4, P(A intersect B) = 1 / 10. Express each of the following events in the language of the experiment and find its probability:

  1. A intersect Bc
  2. A union B
  3. Ac union Bc
  4. Ac intersect Bc
  5. A union Bc

Mathematical Exercise 27. Suppose that A, B, and C are events in an experiment with

P(A) = 0.3, P(B) = 0.2, P(C) = 0.4, P(A B) = 0.04,

P(A intersect C) = 0.1, P(B intersect C) = 0.1, P(A intersect B intersect C) = 0.01

Express each of the following events in set notation and find its probability:

  1. At least one of the three events occurs.
  2. None of the three events occurs.
  3. Exactly one of the three events occurs.
  4. Exactly two of the three events occur.

Mathematical Exercise 28. A pair of fair dice are rolled repeatedly until the sum of the scores is either 5 or 7. The sequence of scores on the final roll are recorded. Let A be the event that the sum is 5 rather than 7.

  1. Define the sample space S mathematically.
  2. Argue that since the dice are fair, S should be given the uniform distribution.
  3. Find P(A).

Probabilities of the type in the last exercise are important in the game of craps.

Mathematical Exercise 29. An experiment consists of tossing 3 fair coins and recording the sequence of scores. Let A be the event that the first coin is heads and B the event that there are exactly 2 heads.

  1. Define the sample space S mathematically.
  2. Argue that since the coins are fair, S should be given the uniform distribution.
  3. Find P(A).
  4. Find P(B)
  5. Find P(A intersect B)
  6. Find P(A union B).
  7. Find P(Ac union Bc).
  8. Find P(Ac intersect Bc).
  9. Find P(A union Bc).

Mathematical Exercise 30. An box contains 12 marbles: 5 are red, 4 are green, and 3 are blue. Three marbles are chosen at random, without replacement.

  1. Define a sample space for which the outcomes are equally likely.
  2. Find P(A) where A be the event that the chosen marbles are all the same color.
  3. Find P(B) where B be the event that the chosen marbles are all different colors

Mathematical Exercise 31. Repeat the last exercise under the assumption that the marbles are chosen with replacement.

Data Analysis Exercise 32. For the M&M data set, let R denote the event that a bag has at least 10 red candies, T the event that a bag has at least 57 candies total, and W the event that a bag weighs at least 50 grams. Find the empirical probability the following events:

  1. R
  2. T
  3. W
  4. R intersect T
  5. T intersect Wc.

Data Analysis Exercise 33. For the cicada data, let W denote the event that a cicada weighs at least 0.20 grams, F the event that a cicada is female, and T the event that a cicada is type tredecula. Find the empirical probability of

  1. W
  2. F
  3. T
  4. W intersect F
  5. F union T union W

Uniqueness and Extension

Recall that the collection of events of the experiment form a sigma algebra A. In some cases, A is generated by some smaller collection of basic events B, that is

A = sigma(B).

We often would like to know that the probabilities of the basic events completely determine the entire probability measure. This turns out to be true if the basic events are closed under intersection. Specifically, suppose that if B, C in B then B intersectC in B (B is called a pi system). If P1 and P2 are probability measures on A and P1(B) = P2(B) for B in B then P1(A) = P2(A) for any A in A.

For example, the standard (Borel) sigma algebra on R is generated by the collection of all open intervals of finite length, which is clearly closed under intersection. Thus, a probability measure P on R is completely determined by its values on the finite open intervals. In addition, the sigma algebra on R is generated by the collection of closed, infinite intervals of the form (-, x]. Thus, a probability measure P on R is completely determined by its values on these intervals.

Next, suppose that we have n sets S1, S2, ..., Sn with sigma algebras A1, A2, ..., An, respectively. Recall that the product set

S = S1 × S2 × ··· × Sn

is a natural sample space for an experiment that consists of multiple measurements, or for a compound experiment that consists of performing n basic experiments in sequence. Usually, we give S the sigma algebra A generated by the collection of product sets of the form

A = A1 × A2 × ··· × An where Ai in Ai for each i.

This collection of product sets is closed under intersection, and hence a probability measure on S is completely determined by its values on these product sets.

Generalizing, suppose that we have an infinite sequence sets S1, S2, ... with sigma algebras A1, A2, ..., respectively. The product set

S = S1 × S2 × ···.

is a natural sample space for an experiment that consists of infinitely many measurements, or for a compound experiment that consists of combining an infinite sequence of basic experiments. Usually, we give S the sigma algebra A generated by the collection of product sets of the form

A = A1 × A2 × ··· × AnSn+1 × Sn+2 × ··· where n is a positive integer and Ai in Ai for each i.

This collection of product sets is closed under intersection, and hence a probability measure on S is completely determined by its values on these product sets.