Virtual Laboratories > Distributions > 1 2 3 4 5 6 [7] 8 9

7. Transformations of Variables


The General Problem

As usual, we start with a random experiment having a sample space and a probability measure P. Suppose that we have a random variable X for the experiment, taking values in S, and a transformation r: S to T. Then Y = r(X) is a new random variable taking values in T. If the distribution of X is known, how do we find the distribution of Y? In a superficial sense, the solution is easy.

Mathematical Exercise 1. Show that

P(Y in B) = P[X in r -1(B)] for B T.

However, frequently the distribution of X is known either through its distribution function F or its density function f, and we would similarly like to find the distribution function or density function of Y. This is a difficult problem in general, because as we will see, even simple transformations of variables with simple distributions can lead to variables with complex distributions. We will solve this problem in various special cases.

Discrete Transformations

Mathematical Exercise 2. Suppose that X has a discrete distribution with density f (and hence S is countable). Show that Y has a discrete distribution with density function g given by

g(y) = x in r-1(y) f(x) for y in T.

Mathematical Exercise 3. Suppose that X has a continuous distribution on a subset S of Rn, with density f, and that T is countable. Show that Y has a discrete distribution with density function g given by

g(y) = r-1(y) f(x)dx for y in T.

Mathematical Exercise 4. Suppose that a pair of fair dice is rolled and the sequence of scores (X1, X2) is recorded. Find the density of the following random variables:

  1. Y = X1 + X2.
  2. Z = X1 - X2.
  3. U = min{X1, X2}
  4. V = max{X1, X2}
  5. (Y, Z)
  6. (U, V)

Mathematical Exercise 5. Suppose that T has the density function f(t) = r exp(-rt), t > 0 where r > 0 is a parameter. (This is the exponential distribution with rate parameter r). Find the density function of the following random variables:

  1. floor(T) (the largest integer less than or equal to T).
  2. ceil(T) (the smallest integer greater than or equal to T).

Mathematical Exercise 6. Suppose that (X, Y) has density function f(x, y) = x + y for 0 < x < 1, 0 < y < 1. Let I denote the indicator variable of the event {X > 1/2} and J the indicator variable of the event {Y > 1/2}. Find the density of (I, J).

Continuous Distributions

Suppose that Y = r(X) where X and Y have continuous distributions, and X has known density f. In many cases, the density of Y can be found by first finding the distribution function of Y (using basic rules of probability) and then computing the derivative of the distribution function.

Mathematical Exercise 7. Suppose that X is uniformly distributed on the interval (-2, 2). Let Y = X2.

  1. Find the distribution function of Y.
  2. Find the density function of Y. Sketch the graph.

Mathematical Exercise 8. Suppose that X is uniformly distributed on the interval (-1, 3). Let Y = X2.

  1. Find the distribution function of Y.
  2. Find the density function of Y. Sketch the graph.

The last exercise show that even a simple transformation of a simple distribution can produce a complicated distribution.

Mathematical Exercise 9. Suppose that X has density function f(x) = a / xa + 1 for x > 1, where a > 0 is a parameter (this is the Pareto distribution with shape parameter a). Let Y = ln(X).

  1. Find the distribution function of Y.
  2. Find the density function of Y. Sketch the graph.

Note that the random variable Y in the previous exercise has the exponential distribution with rate parameter a.

Mathematical Exercise 10. Suppose that (X, Y) has density f(x, y) = exp(-x -y) for x > 0, y > 0. Thus X and Y are independent, and each has the exponential distribution with parameter 1. Let Z = Y / X.

  1. Find the distribution function of Z.
  2. Find the density function of Z.

Mathematical Exercise 11. Absolute value of a random variable. Suppose that X has a continuous distribution on R with distribution function F and density function f. Show that

  1. |X| has distribution function G(y) = F(y) - F(-y) for y > 0.
  2. |X| has density function g(y) = f(y) + f(-y) for y > 0.

Mathematical Exercise 12. Continuation. Suppose that the density f of X is symmetric with respect to 0. Let J denote the sign of X, so that J = 1 if X > 0, J = 0 if X = 0, and J = -1 if X < 0. Show that

  1. |X| has distribution function G(y) = 2F(y) - 1 for y > 0.
  2. |X| has density function g(y) = 2f(y) for y > 0.
  3. J is uniformly distributed on {-1, 1}
  4. |X| and J are independent.

The Uniform Distribution on (0, 1)

A remarkable fact is that the uniform distribution on (0, 1) can be transformed into any other distribution on R. This is particularly important for simulations, since many computer languages have an algorithm for generating random numbers, which are simulations of independent variables, each uniformly distributed on (0, 1). Conversely, any continuous distribution supported on an interval of R can be transformed into the uniform distribution on (0, 1).

Suppose first that F is a distribution function and let F-1 denote the quantile function.

Mathematical Exercise 13. Suppose that U is uniformly distributed on (0, 1). Show that X = F-1(U) has distribution function F.

Assuming that we can compute F-1, the previous exercise shows how we can simulate a distribution with distribution function F. To rephrase the result, we can simulate a variable with distribution function F by simply computing a random quantile.

Mathematical Exercise 14. Suppose that X has has a continuous distribution on an interval S and that the distribution function function F is strictly increasing on S. Show that U = F(X) has the uniform distribution on (0, 1).

Mathematical Exercise 15. Show how to simulate, with a random number, the uniform distribution on the interval (a, b).

Mathematical Exercise 16. Show how to simulate, with a random number, the exponential distribution with rate parameter r > 0.

Mathematical Exercise 17. Show how to simulate, with a random number, the Pareto distribution with shape parameter a > 0.

The Change of Variables Formula

When the transformation r is one-to-one and smooth, there is a formula for the density of Y directly in terms of the density of X. This is known as the change of variables formula. 

We will explore the one-dimensional case first, where the concepts and formulas are simplest. Thus, suppose that random variable X has a continuous distribution on an interval S of R, with distribution function F and density function f. Suppose that Y = r(X) where r is a differentiable function from S onto an interval T. As usual, we will let G denote the distribution function of Y and g the density function of Y.

Mathematical Exercise 18. Suppose that r is strictly increasing on S. Show that for y in T

  1. G(y) = F[r-1(y)]
  2. g(y) = f[r-1(y)] dr-1(y) / dy

Mathematical Exercise 19. Suppose that r is strictly decreasing on S. Show that y in T

  1. G(y) = 1 - F[r-1(y)]
  2. g(y) = -f[r-1(y)] dr-1(y) / dy

The density formulas in Exercises 18 (a) and 19 (b) can be combined: if r is a strictly monotone on S then the density g of Y is given by

g(y) = f[r-1(y)] |dr-1(y) / dy| for y in T.

The generalization this result is basically a theorem in multivariate calculus. Suppose that X is a random variable taking values in a subset S of Rn and that X has a continuous distribution with probability density function f. Suppose that Y = r(X) where r is a one-to-one differentiable function form S onto a subset T of Rn. The Jacobian (named in honor of Karl Gustav Jacobi) of the inverse function

x = r -1(y)

is the determinant of the first derivative matrix of the inverse function, that is, the matrix whose (i, j) entry is the derivative of xi with respect to yj. We will denote the Jacobian by J(y). The multivariate change of variables formula states that the density g of Y is given by

g(y) = f[r-1(y)] |J(y)| for y in T.

Mathematical Exercise 20. Suppose that X is uniformly distributed on the interval (2, 4). Find the density function of Y = X2.

Mathematical Exercise 21. Suppose that X has the density function f(x) = x2 / 3 for –1 < x < 2. Find the density function of Y = X1/3.

Mathematical Exercise 22. Suppose that X has the Pareto distribution with shape parameter a > 0. Find the density function of Y = 1/X. The distribution of Y is the beta distribution with parameters a and b = 1.

Mathematical Exercise 23. Suppose that X and Y are independent and each is uniformly distributed on (0, 1). Let U = X + Y and V = X - Y.

  1. Sketch the range of (X, Y) and the range of (U, V).
  2. Find the density function of (U, V).
  3. Find the density function of U.
  4. Find the density function of V.

Some of the results of the previous exercise will be generalized in the next subsection.

Mathematical Exercise 24. Suppose that (X, Y) has probability density function f(x, y) = 2(x + y) for 0 < x < y < 1. Let U = XY and V = Y/X.

  1. Sketch the range of (X, Y) and the range of (U, V).
  2. Find the density of (U, V).
  3. Find the density function of U.
  4. Find the density function of V.

Linear Transformations

Linear transformations are among the most common and important transformations. Moreover, the change of variable theorem has a particularly simple form when the linear transformation is expressed in matrix form. Thus, as above, suppose that X is a random variable taking values in a subset S of Rn and that X has a continuous distribution on S with probability density function f. Let

Y = AX

where A is an invertible n × n matrix. Recall that the transformation y = Ax is one-to-one, and the inverse transformation is

x = A-1y.

Note that and that Y takes values in the subset T = {Ax: x in S} of Rn.

Mathematical Exercise 25. Show that the Jacobian is J(y) = det(A-1) for y in T.

Mathematical Exercise 26. Apply the change of variables theorem to show that Y has density function

g(y) = f(A-1y) |det(A-1)| for y in T.

The uniform distribution is preserved under linear transformations:

Mathematical Exercise 27. Suppose that X is uniformly distributed on S. Show that Y is uniformly distributed on T.

Mathematical Exercise 28. Suppose that (X, Y, Z) is uniformly distributed on the cube (0, 1)3. Find the density function of

(U, V, W) where U = X + Y, V = Y + Z, W = X + Z.

Mathematical Exercise 29. Suppose that (X, Y) has density function f(x, y) = exp[-(x + y)] for x > 0, y > 0 (thus, X and Y are independent, and each has the exponential distribution with parameter 1). Find the density function of

(U, V) where U = X + 2Y, V = 3X - Y.

Convolution

The most important of all transformations is simple addition.

Mathematical Exercise 30. Suppose that X and Y are independent, discrete random variables, taking values in subsets S and T of R, with density functions f and g, respectively. Show that the density of Z = X + Y is

f * g(z) = x f(x)g(z - x)

where the sum is over {x in R: x in S and z - x in T}. The density f * g is called the discrete convolution of f and g.

Mathematical Exercise 31. Suppose that X and Y are independent, continuous random variables, taking values in subsets S and T of R, with density functions f and g, respectively. Show that the density of Z = X + Y is

f * g(z) = integralR f(x)g(z - x)dx.

The density f * g is called the continuous convolution of f and g.

Mathematical Exercise 32. Show that convolution (either discrete or continuous) satisfies the following properties

  1. f * g = g * f (the commutative property)
  2. f * (g * h) = (f * g) * h (the associative property)

Note that if X1, X2, ..., Xn are independent and identically distributed with common density function f, then

Y = X1 + X2 + ··· + Xn.

has density function f*n, the n-fold convolution of f with itself.

Mathematical Exercise 33. Suppose two fair dice are rolled. Find the density of the sum of the scores.

Simulation Exercise 34. In the dice experiment, select two fair dice. Run the simulation 1000 times, updating every 10 runs and note the apparent convergence of the empirical density function to the true density function.

Mathematical Exercise 35. For an ace six flat die, faces 1 and 6 occur with probability 1/4 each and the other faces with probability 1/8 each. Suppose that an ace-six flat die is rolled twice. Find the density function of the sum of the scores.

Simulation Exercise 36. In the dice experiment, select two ace-six flat dice. Run the simulation 1000 times, updating every 10 runs and note the apparent convergence of the empirical density function to the true density function.

Mathematical Exercise 37. A fair die and an ace-six flat die are rolled. find the density function of the sum of the scores.

Mathematical Exercise 38. Suppose that X has the exponential distribution with rate parameter a > 0, Y has the exponential distribution with rate parameter b > 0, and that X and Y are independent. Find the density of Z = X + Y.

Mathematical Exercise 39. Let f denote the density function of the uniform distribution on (0, 1). Compute f*2 and f*3. Graph the three densities.

Several important parametric families of distributions are closed under convolution. That is, when two independent random variables have distributions that belong to the family, then so does the sum. This is a very special property and indeed is one of the reasons why such families are important.

Mathematical Exercise 40. Recall that f(n) = exp(-t) tn / n! for n = 0, 1, 2, ... is the probability density function of the Poisson distribution with parameter t > 0. Suppose that X and Y are independent variables, and that X has the Poisson distribution with parameter a > 0 while Y has the Poisson distribution with parameter b > 0. Show that X + Y has the Poisson distribution with parameter a + b. Hint: You will need to use the binomial theorem.

Mathematical Exercise 41. Recall that f(k) = C(n, k) pk (1 - p)n - k for k = 0, 1, ..., n is the probability density function of the binomial distribution with parameters n in {1, 2, ...} and p in (0, 1). Suppose that X and Y are independent variables, and that X has the binomial distribution with parameters n and p  while Y has the binomial distribution with parameter m and p. Show that X + Y has the binomial distribution with parameter n + m and p. Hint: You will need to use the binomial theorem.

Minimum and Maximum

Suppose that X1, X2, ..., Xn are independent real-valued random variables and that Xi has distribution function Fi for each i. The minimum and maximum transformations are very important in a number of applications. Specifically, let 

 and let G and H denote the distribution functions of U and V respectively.  . 

Mathematical Exercise 42.  Show that

  1. V <= x if and only if X1 <= x, X2 <= x, ..., Xn <= x.
  2. H(x) = F1(x) F2(x) ··· Fn(x) for x in R.

Mathematical Exercise 43. Show that

  1. U > x if and only if X1 > x, X2 > x, ..., Xn > x.
  2. G(x) = 1 - [1 - F1(x)][1 - F2(x)] ··· [1 - Fn(x)] for x in R.

If Xi has a continuous distribution with density function fi for each i, then U and V also have continuous distributions, and the densities can be obtained by differentiating the distribution functions in Exercises 37, 38.

Mathematical Exercise 44. Suppose that X1, X2, ..., Xn are independent random variables, each uniformly distributed on (0, 1). Find the distribution and density function of

  1. U = min{X1, X2, ..., Xn}
  2. V = max{X1, X2, ..., Xn}

Note that U and V in the previous exercise have beta distributions.

Simulation Exercise 45. In the order statistic experiment, select the uniform distribution. 

  1. Set k = 1 (this gives the minimum U). Vary n with the scroll bar and note the shape of the density function. With n = 5, run the simulation 1000 times, updating every 10 runs. Note the apparent convergence of the empirical density function to the true density function.
  2. Vary n with the scroll bar, set k = n each time (this gives the maximum V), and note the shape of the density function. With n = 5, run the simulation 1000 times, updating every 10 runs. Note the apparent convergence of the empirical density function to the true density function.

Mathematical Exercise 46. Suppose that X1, X2, ..., Xn are independent random variables, and that Xi has the exponential distribution with rate parameter ri > 0 for each i. Find the distribution and density function of

  1. Find the distribution function of U = min{X1, X2, ..., Xn}
  2. Find the distribution function of  V = max{X1, X2, ..., Xn}
  3. Find the density function of U and V in the special case that ri = r for each i

Note that the minimum U in part (a) has the exponential distribution with parameter

r1 + r2 + ··· + rn.

Simulation Exercise 47. In the order statistic experiment, select the exponential distribution. 

  1. Set k = 1 (this gives the minimum U). Vary n with the scroll bar and note the shape of the density function. With n = 5, run the simulation 1000 times, updating every 10 runs. Note the apparent convergence of the empirical density function to the true density function.
  2. Vary n with the scroll bar, set k = n each time (this gives the maximum V), and note the shape of the density function. With n = 5, run the simulation 1000 times, updating every 10 runs. Note the apparent convergence of the empirical density function to the true density function.

Mathematical Exercise 48. Suppose that n fair dice are rolled. Find the density function of the

  1. minimum score
  2. maximum score.

Simulation Exercise 49. In the dice experiment, select fair dice and select each of the following random variables. Vary n with the scroll bar and note the shape of the density function. With n = 4, run the simulation 1000 times, updating every 10 runs. Note the apparent convergence of the empirical density function to the true density function.

  1. minimum score
  2. maximum score.

Mathematical Exercise 50. Suppose that n ace-six flat dice are rolled (faces 1 and 6 each have probability 1/4; faces 2, 3, 4, 5 each have probability 1/8). Find the density function of the

  1. minimum score
  2. maximum score.

Simulation Exercise 51. In the dice experiment, select ace-six flat dice and select each of the following random variables. Vary n with the scroll bar and note the shape of the density function. With n = 4, run the simulation 1000 times, updating every 10 runs. Note the apparent convergence of the empirical density function to the true density function.

  1. minimum score
  2. maximum score.

For a related topic, see the discussion of order statistics in the chapter Random Samples.