The binomial distribution has the following properties: • The mean of the distribution (μ x) is equal to n * P . • The variance (σ2x) is n * P * ( 1 - P ). • The standard deviation (σx) is sqrt[ n * P * ( 1 - P ) ]. Chapter 5 Continuous Distributions Probability Distributions: Discrete vs continuous If a variable can take on any value between two specified values, it is called a continuous variable; otherwise, it is called a discrete variable. Examples: 1.
Suppose the fire department mandates that all fire fighters must weigh between 150 and 250 pounds. The weight of a fire fighter would be an example of a continuous variable; since a fire fighter's weight could take on any value between 150 and 250 pounds.
2.
Suppose we flip a coin and count the number of heads. The number of heads could be any integer value between 0 and plus infinity. However, it could not be any number between 0 and plus infinity. We could not, for example, get 2.5 heads. Therefore, the number of heads must be a discrete variable.
Discrete probability Distribution If a random variable is a discrete variable, its probability distribution is called a discrete probability distribution. Continuous Probability Distribution If a random variable is a continuous variable, its probability distribution is called a continuous probability distribution. A continuous probability distribution differs from a discrete probability distribution in several ways. • The probability that a continuous random variable will assume a particular value is zero. • As a result, a continuous probability distribution cannot be expressed in tabular form. • Instead, an equation or formula is used to describe a continuous probability distribution. Most often, the equation used to describe a continuous probability distribution is called a probability density function.( PDF)
For a continuous probability distribution, the density function has the following properties: • Since the continuous random variable is defined over a continuous range of values (called the domain of the variable), the graph of the density function will also be continuous over that range. • The area bounded by the curve of the density function and the x-axis is equal to 1, when computed over the domain of the variable. • The probability that a random variable assumes a value between a and b is equal to the area under the density function bounded by a and b. For example, consider the probability density function shown in the graph below. Suppose we wanted to know the probability that the random variable X was less than or equal to a. The probability that X is less than or equal to a is equal to the area under the curve bounded by a and minus infinity - as indicated by the shaded area.
The probability density function (PDF) of the random variable X is defined as follows: 1. Discrete: if X is a discrete random variable, then the PDF is a probability: p(x) =P(X=x) for all real numbers x. 2.Continuous: If X is a continuous random variable, then the PDF is a rate: f(x) =d/dx F(x) whenever the derivative exists
Probability Density function: ∞
∫ p ( x ) dx=1 −∞
f ( x )≥ 0
∫ f ( x ) dx=1
for all x
Probability density functions can be used to determine the probability that a continuous random variable lies between two values, say a and b. This probability is denoted by
Example:
and is given by,
A clock stops at random at any time during the day. Let X be the time (hours plus fractions of hours) at which the clock stops. The pdf for X is known as 1 , 0 ≤ x ≤ 24 24
f(x)= =
0,
otherwise
To find the probability of pdf will stop at 14.00 and 14.75 14.75
P (14 ≤ x ≤14.75 ) = ∫ 14
14.75
1 x 1 dx= ∫ = 24 32 14 4
Example 1 Let for and each of the following questions about this function. (a) Show that
for all other values of x. Answer
is a probability density function.
First note that in the range we’ve defined it to be zero
is clearly positive and outside of this range
. (b) Find
\
(c) Find Note that in this case is equivalent to since 10 is the largest value that X can be. So the probability that X is greater than or equal to 6 is,
Example 2 It has been determined that the probability density function for the wait in line at a counter is given by,
Where t is the number of minutes spent waiting in line. Answer each of the following questions about this probability density function. (a) Verify that this is in fact a probability density function This function is clearly positive or zero and so there’s not much to do here other than compute the integral. lim ¿=1 ∞
∞
−t 10
¿
u
−t
∫ f ( t ) dt=∫ 0.1e dt=u → ∞∫ 0.1 e 10 dt −∞
0
0
(b) Determine the probability that a person will wait in line for at least 6 minutes. ∞
−t
−3
P ( X ≥ 6 )=∫ 0.1 e 10 dt=e 5 =0.54881 6
(c) Determine the mean wait in line.
5.1 Cdf and expectation for continuous RVs
Expectation and Variance With discrete random variables, we had that the expectation was S x P(X = x) , where P(X = x) was the p.d.f.. It may come as no surprise that to find the expectation of a continuous random variable, we integrate rather than sum, i.e.:
Percentile Definition1: A percentile is a measure that tells us what percent of the total frequency scored at or below that measure. A percentile rank is the percentage of scores that fall at or below a given score. Definition 2: A percentile is a measure that tells us what percent of the total frequency scored below that measure. A percentile rank is the percentage of scores that fall below a given score. Formula: To find the percentile rank of a score, x, out of a set of n scores, where x is included:
Where B = number of scores below x E = number of scores equal to x n = number of scores Or To find the percentile rank of a score, x, out of a set of n scores, where x is not included:
Example: If Jason graduated 25th out of a class of 150 students, then 125 students were ranked below Jason. Jason's percentile rank would be:
or
Jason's standing in the class at the 84th percentile is as higher as or higher than 84% of the graduates.
1. The math test scores were: 50, 65, 70, 72, 72, 78, 80, 82, 84, 84, 85, 86, 88, 88, 90, 94, 96, 98, 98, 99. Find the percentile rank for a score of 84 on this test. Be sure the scores are ordered from smallest to largest.Locate the 84. Solution Using Formula:
Solution Using Visualization: Since there are 2 values equal to 84, assign one to the group "above 84" and the other to the group "below 84".50, 65, 70, 72, 72, 78, 80, 82, 84, | 84, 85, 86, 88, 88, 90, 94, 96, 98, 98, 99
The score of 84 is at the 45th percentile for this test. 2. The math test scores were: 50, 65, 70, 72, 72, 78, 80, 82, 84, 84, 85, 86, 88, 88, 90, 94, 96, 98, 98, 99. Find the percentile rank for a score of 86 on this test. Be sure the scores are ordered from smallest to largest.Locate the 86. Solution Using Formula:
Solution Using Visualization: Since there is only one value equal to 86, it will be counted as "half" of a data value for the group "above 86" as well as the group "below 86".50, 65, 70, 72, 72, 78, 80, 82, 84, 84, 85, 8|6, 88, 88, 90, 94, 96, 98, 98, 99
The score of 86 is at the 58th percentile for this test. 3.Quartiles can be thought of as percentile measure. Remember that quartiles break the data set into 4 equal parts. If 100% is broken into four equal parts, we have subdivisions at 25%, 50%, and 75% creating the: First quartile (lower quartile) to be at the 25th percentile. Median (or second quartile) to be at the 50th percentile. Third quartile (upper quartile) to be a the 75th
percentile.
Test Scores 76-80 81-85 86-90 91-95
Frequency 3 7 6 4
Cumulative Frequency 3 10 16 20
For the table at the left, find the intervals in which the first, second and third quartiles lie. If there are a total of 20 scores, the first quartile will be located (25% · 20 = 5) five values up from the bottom. This puts the first quartile in the interval 8185.
In a similar fashion, the second quartile will be The third quartile will be located (75% · located (50% · 20 = 10) ten values up from the 20 = 15) fifteen values up from the bottom in the interval 81-85. bottom in the interval 86-90.
The Pth percentile is found in position P(n + 1)/100.
Example Let's find the 15th percentile of the starting salary data. Here's the sorted version. 08820 10800 12000 12500 13000 14000 15000 16000 16500 16600 16700 16900 16900 17000 17000 17600 17880 18000 18000 18000 18000 18000 18000 18000 18000 18000 18000 18500 18680 19100 20000 20000 20000 20000 20000 20300 20900 22000 23000 23000 23000 23000 23400 24000 25000 25000 26000 26000 27000 30000 30000 32500 37000 48000 Our position is 15(54 + 1)/100 = 15(55)/100 = 8.25. Round to position 8; the corresponding observation is shown in red. The 15th percentile is approximately 16000. A better approach uses interpolation. The 8.25 position is really between the 8 and 9 positions, shown in red and blue respectively. One-quarter of the way from 16000 to 16500 is 0.75(16000)+0.25(16500) = 16125. So 16125 is a (better) approximation to the 15th percentile. However, it's not that different from 16000, and either would suffice.
According to textbook version when test score is in 80th percentile, then 80% of scores are below or at the test score, and 20% exceed. the 100p-th percentile of cdf F(x) is p = F [η(p)]=P(X≤η(p)) The probability that a value of the continuous rv X is greater than or equal to (100p)% of the other values in the distribution is the cdf evaluated at η(p). F[η(p), λ] = 1 - e-λη(p) = p η( p)
p=F ( η ( p )) = ∫ f ( y ) dy −∞
The Median The Median is the 50th percentile. 50% of the data are below the median, 50% above.