Virtual Laboratories > Probability Spaces > 1 2 3 [4] 5 6 7 8
Suppose that we have a random experiment with sample space S. The probability of an event is a measure of how likely the event is to occur when the experiment if run.
Mathematically, a probability measure (or distribution) P for a random experiment is a real-valued function defined on the collection of of events that satisfies the following axioms:
Axiom 3 is known as countable additivity, and states that the probability of a union of a finite or countably infinite collection of disjoint events is the sum of the corresponding probabilities. The axioms are known as the Kolmogorov axioms, in honor of Andrey Kolmogorov.
Axioms 1 and 2 are really just a matter of convention; we choose to measure the probability of an event with a number between 0 and 1 (as opposed, say, to a number between -5 and 7). Axiom 3 however, is fundamental and inescapable. It is required for probability for precisely the same reason that it is required for other measures of the "size" of a set, such as
On the other hand, uncountable additivity (the extension of axiom 3 to an uncountable index set J) is unreasonable for probability, just as it is for other measures. For example, an interval of positive length in R is a union of uncountably many points, each of which has length 0.
We now have defined the three essential ingredients that model a random experiment:
Together these define a probability
space
Intuitively, the probability of an event is supposed to measure the long-term relative frequency of the event. Specifically, suppose that we repeat the experiment indefinitely. (Note that this actually creates a new, compound experiment.) For an event A in the basic experiment, let Nn(A) denote the number of times A occurred (the frequency of A) in the first n runs. (Note that this is a random variable in the compound experiment). Thus,
Pn(A) = Nn(A) / n
is the relative frequency of A in the first n runs. If we have chosen the correct probability measure for the experiment, then in some sense we expect that the relative frequency of each event should converge to the probability of the event:
Pn(A)
P(A) as n
.
The precise statement of this is the law of large numbers or law of averages, one of the fundamental theorems in probability. To emphasize the point, note that in general there will be lots of possible probability measures for an experiment, in the sense of the axioms. However, only the true probability measure will satisfy the law of large numbers.
It follows that if we have the data from n runs of the experiment, the observed relative frequency Pn(A) can be used as an approximation for P(A); this approximation is called the empirical probability of A.
1.
Show that Pn
satisfies the axioms of a probability measure (given the data from n
runs of the experiment)
Suppose that X is a random variable for the experiment, taking values in a set T.
2.
Show that P(X
B) as a function of B
T,
defines a probability measure on T. Hint: Recall that the inverse
image preserves all set operations.
The probability measure in the previous exercise is called the probability distribution of X. Thus, any random variable X for an experiment defines a new probability space:
Moreover, recall that the outcome of the experiment itself can be thought of as a random variable. Specifically, if we take X to be the identity function on S, then X is a random variable and
P(X
A) = P(A).
Thus, any probability measure can be thought of as the distribution of a random variable.
How can we construct probability measures? As noted briefly above, there are other measures of the "size" of sets; in many cases, these can be converted into probability measures.
First, a (nonnegative) measure m on S is a mapping on the (measurable) subsets of S that satisfies axioms 1 and 3 above. In general, m(A) is allowed to be infinite for a subset A. However, if m(S) is positive and finite, then m can easily be rescaled into a probability measure.
3. Show that if m
is a measure on S with m(S) finite and positive, then P defined below is a probability
measure on S.
P(A) = m(A) / m(S) for A
S.
In the context of Exercise 3, m(S) is called the normalizing constant. In the next two subsections, we consider some very important special cases.
Suppose that S is a finite, nonempty set. Clearly, counting measure # is a finite measure on S:
#(A) = the number of elements in A for A
S.
The corresponding probability measure is called the discrete uniform distribution on S, and is particularly important in combinatorial and sampling experiments:
P(A) = #(A) / #(S) for A
S.
We can give a more general construction for countable sample spaces that can be used to define many probability measures.
4. Suppose that S is
nonempty and countable, and that g is a nonnegative
real-valued function defined on S. Show that m defined below is a measure on
S:
m(A) =
x
in A g(x) for A
S.
Thus, if m(S) is finite and positive, then P(A) = m(A) / m(S) defines a probability measure by Exercise 3. Distributions of this type are said to be discrete. Discrete distributions are studied in detail in the chapter on Distributions.
5.
In
the setting of previous exercise, show that if S is finite and g is
a constant function, then the corresponding probability measure P is the
discrete uniform distribution on S.
We define n-dimensional measure on Rn (also called Lebesgue measure, in honor of Henri Lebesgue) by
mn(A) =
A
1dx for A
Rn.
Note that if n > 1, the integral above is a multiple integral;
Now, if S is a subset of Rn with mn(S) positive and finite, then
P(A) = mn(A) / mn(S)
is a probability measure on S by Exercise 2, called the continuous uniform distribution on S.
We can generalize this construction to produce many other distributions. Suppose that g is a nonnegative real valued function defined on S. Define
m(A) =
A
g(x) dx for A
S.
Then m is a measure on S. Thus if m(S) is finite and positive, then P(A) = m(A) / m(S) defines a probability measure as in Exercise 2. Distributions of this type are said to be continuous. Continuous distributions are studied in detail in the chapter on Distributions.
It is important to note again that, unlike many other areas of mathematics, the low-dimensional spaces (n = 1, 2, 3) do not play a special role, except for exposition. For example in the Cicada data, some of the variables recorded are body weight, body length, wing width, and wing length. A probability model for these variables would specify a distribution on a subset of R4.
Suppose that we have a random experiment with sample space S and probability measure P. In the following exercises, A and B are events.
6. Show that P(Ac)
= 1 - P(A).
7. Show that P(Ø)
= 0.
8. Show that P(B
Ac) = P(B)
- P(A
B).
9. Show that if A
B then P(B
Ac) = P(B)
- P(A).
Recall that B
Ac
is sometimes written B - A when A
B.
With this notation, the result in the previous exercise has the attractive form
P(B - A) = P(B) - P(A).
10. Show that if A
B then P(A)
P(B).
11. Suppose that
{Aj:
j
J} is a countable collection of
events. Prove Boole's inequality (named after
George Boole):
P[
j
Aj]
j
P(Aj).
Hint: Let J = {1, 2, ...} and define B1 = A1, B2
= A2
A1c,
B3 = A3
A1c
A2c, ... Show B1,
B2, ... are pairwise disjoint and have the same union as A1,
A2, .... Use the additivity axiom of probability and the result of
Exercise 6.
12. Suppose that
{Aj:
j
J} is a countable collection of
events with P(Aj) = 0 for
each j in J. Use Boole's inequality to show that
P[
j
Aj] = 0.
13. Suppose
that {Aj:
j
J}is a countable collection of events. Prove Bonferroni's
inequality (named after Carlo
Bonferroni):
P[
j
Aj]
1
-
j
[1 - P(Aj)].
Hint: Apply Boole's inequality to {Ajc:
j
J}
14. Suppose that
that {Aj:
j
J}is a countable collection of
events with P(Aj) = 1 for
each j in J. Use Bonferroni's inequality to show that
P[
j
Aj] = 1.
15.
Suppose that A and B are events in an experiment with P(A) = 1.
Show that P(A
B) = P(B)
16.
Prove the law of total probability: if {Aj:
j
J} is a countable collection of
events that partition the sample space S, then for any event B,
P(B) =
j
P(Aj
B).
The inclusion-exclusion formulas provide a method for computing the probability of a union of events in terms of the probabilities of various intersections of the events.
17. Show that
if A and B are events then
P(A
B) = P(A)
+ P(B) - P(A
B).
18. Show that
if A, B, and C are events then
![]()
![]()
The last two exercises can be generalized to a union of n events Ai, i = 1, 2, ...n. The generalization is known as the inclusion-exclusion formula. To simplify the formulation, let N denote the index set {1, 2, ..., n}. Define
19. Show that
P[
i
= 1, ..., n Ai] =
k
= 1, ..., n (-1)k - 1
qk.
The general Bonferroni inequalities state that if sum on the right is truncated after k terms (k < n), then the truncated sum is an upper bound for the probability of the union if k is odd (so that the last term has a positive sign) and is a lower bound for the probability of the union if k is even (so that the last terms has a negative sign).
If you go back and look at your proofs of the basic properties in Exercises 6-19, you will see that they hold for any finite measure m, not just probability. The only change is that the number 1 is replaced by m(S). In particular, the inclusion-exclusion rule is as important in combinatorics (the study of counting measure) as it is in probability.
20.
Suppose
that we roll 2 fair dice and record the sequence of scores. Let A denote the event that the first
die score is less than 3 and B the event that the sum of the dice scores is
6.
21. In the dice
experiment, set n = 2. Run the experiment 100 times and
compute the empirical probability of each event in the previous exercise.
22. Consider the
experiment in which 2 cards are dealt from a standard deck and the
sequence of cards recorded. For i = 1, 2, let Hi
denote the event that card i is a heart.
23. In the card
experiment, set n = 2. Run the experiment 100 times and
compute the empirical probability of each event in the previous exercise
24. Recall that in
Buffon's coin experiment, a coin
with radius r
1/2 is tossed
"randomly" on a floor with with square tiles of side length 1, and the
coordinates of the center of the coin are recorded, relative to the center of the square
in which the coin lands. Let A denote the event that the coin does not touch the sides of
the square.
25. In Buffon's coin
experiment, set r = 0.2. Run the experiment 100 times and
compute the empirical probability of each event in the previous exercise.
26. Suppose that A
and B are events in an experiment with P(A) = 1 / 3, P(B) = 1 / 4, P(A
B) = 1 / 10.
Express each of the following events in the language of the experiment and find its
probability:
27. Suppose that A,
B, and C are events in an experiment with
P(A) = 0.3, P(B) = 0.2, P(C) =
0.4, P(A
B) =
0.04,
P(A
C) = 0.1, P(B
C) = 0.1, P(A
B
C) = 0.01
Express each of the following events in set notation and find its probability:
28. A pair of fair
dice are rolled repeatedly until the sum of the scores is either 5 or 7. The
sequence of scores on the final roll are recorded. Let A be the event that the
sum is 5 rather than 7.
Probabilities of the type in the last exercise are important in the game of craps.
29. An experiment
consists of tossing 3 fair coins and recording the sequence of scores. Let A
be the event that the first coin is heads and B the event that there are
exactly 2 heads.
30. An box
contains 12 marbles: 5 are red, 4 are green, and 3 are blue. Three marbles are
chosen at random, without replacement.
31.
Repeat the last exercise under the assumption that the marbles are chosen with replacement.
32. For the M&M data
set, let R denote the event that a bag
has at least 10 red candies, T the event that a bag has at least 57
candies total, and W the event that a bag weighs at least 50 grams. Find
the empirical probability the following events:
33. For the cicada
data, let W denote the event that a cicada weighs at least
0.20 grams, F the event that a cicada is female, and T the event
that a cicada is type tredecula. Find the empirical probability of
Recall that the collection of events of the experiment form a sigma algebra A. In some cases, A is generated by some smaller collection of basic events B, that is
A = sigma(B).
We often would like to know that the probabilities of the basic events
completely determine the entire probability measure. This turns out to be true
if the basic events are closed under intersection. Specifically, suppose that if
B, C
B
then B
C
B (B
is called a pi system). If P1 and P2
are probability measures on A and P1(B) =
P2(B) for B
B
then P1(A) = P2(A)
for any A
A.
For example, the standard (Borel) sigma algebra on R is generated by
the collection of all open intervals of finite length, which is clearly closed
under intersection. Thus, a probability measure P on R is
completely determined by its values on the finite open intervals. In addition,
the sigma algebra on R is generated by the collection of closed, infinite
intervals of the form (-
,
x]. Thus, a probability measure P on R is completely
determined by its values on these intervals.
Next, suppose that we have n sets S1, S2, ..., Sn with sigma algebras A1, A2, ..., An, respectively. Recall that the product set
S = S1 × S2 × ··· × Sn
is a natural sample space for an experiment that consists of multiple measurements, or for a compound experiment that consists of performing n basic experiments in sequence. Usually, we give S the sigma algebra A generated by the collection of product sets of the form
A = A1 × A2 × ··· × An
where Ai
Ai for each i.
This collection of product sets is closed under intersection, and hence a probability measure on S is completely determined by its values on these product sets.
Generalizing, suppose that we have an infinite sequence sets S1, S2, ... with sigma algebras A1, A2, ..., respectively. The product set
S = S1 × S2 × ···.
is a natural sample space for an experiment that consists of infinitely many measurements, or for a compound experiment that consists of combining an infinite sequence of basic experiments. Usually, we give S the sigma algebra A generated by the collection of product sets of the form
A = A1 × A2 × ··· × An.×
Sn+1 × Sn+2 ×
··· where n is a positive integer and Ai
Ai for each i.
This collection of product sets is closed under intersection, and hence a probability measure on S is completely determined by its values on these product sets.