7.1 The Central Limit Theorem for Sample Means (Averages) and Proportions
SAMPLING DISTRIBUTIONS
Let’s say we are interested in knowing the average incubation period for Covid-19 virus in humans. This average is for all humans and is, therefore, a parameter, which we will label as
In our example of the virus incubation period, let’s say we took our initial sample of 50 individuals from all over the world and recorded the incubation periods they experienced. From those 50 observations, we will calculate their average incubation period, called the sample average or the sample mean,
If we created a histogram or a dot plot of these sample means, the distribution that we’d obtain is called the sampling distribution of the sample means. Each sample mean here will be an estimator for the population mean. That is, looking at our initial sample mean of 4.5 days, our best estimate for the average virus incubation period in humans is also 4.5 days. However, if we instead take an average of all of the sample means, that average will be a better estimate of the population mean. In fact, as we add more and more sample results to find the average of the sample means, we’ll get closer and closer to the actual population mean, and this happens with just using the samples without us knowing anything about the population mean. As the number of samples increases without bound, the average of the sample means,
The Central Limit Theorem (CLT) gives us a simple and elegant picture of what the sampling distribution of sample statistics (such as the sample mean or sample proportion) would be like given certain conditions are met. CLT is one of the most powerful and useful ideas in all of statistics.
The sampling distribution of the sample means approaches a normal distribution more and more as the sample size
Suppose a random variable has a mean of
How large should the sample size be in order for the CLT to kick in? It is generally accepted that a sample size of at least
Review & Practice
Review: Sampling distribution of a sample mean
Please complete the following practice exercises:
EXAMPLE
The amount of coffee that people drink per day is normally distributed with a mean of
- What is the distribution of
? - What is the distribution of
? - What is the probability that one randomly selected person drinks between
and ounces of coffee per day? - For the
people, find the probability that the average coffee consumption is between and ounces of coffee per day. - For part
, is the assumption that the distribution is normal necessary? - Find the IQR for the average of
coffee drinkers.
SHOW SOLUTION
In this question, the random variable
. is a random variable representing sample means, and from CLT we know that the distribution of sample means is approximately normal with mean and standard deviation . So the answer is:
DESMOS CALCULATOR
Calculator Usage Guide
In the first input box on Desmos calculator, enter:
4. For the
First click on the circle graph icon to the left of the input entry in the first box to turn off the graph from the last part. In the second input box on Desmos calculator, enter:
5. YES. The assumption that the population distribution is normal necessary for the CLT to work (that is for the sampling distribution of sample means to be approximately normal) here because the sample size is less than
6. …IQR for the average of
Recall that to find the value given area, we need Inversecdf, and takes area to the left.
For
For
In input box #4, enter:
IQR =
Show me the steps for STATKEY
Details here!
3. …probability that one randomly selected person drinks between
Which distribution applies here:
We’re asked about the amount ounces of coffee per day, which is the random variable
Probability that one randomly selected person drinks between
Between 15.5 and 16.5 is an interval of values of the random variable
On StatKey main page, click on NORMAL under Theoretical Distributions.
Click on Edit Parameters and enter your MEAN and STANDARD DEVIATION. For this question, mean
Select both Right Tail and Left Tail checkboxes at the top left. Click on the blue box below the left tail and change that to 15.5 and hit enter. Next, click on the blue box below the right tail and change that to 16.5 and hit enter. We’re looking for the area in between. Why?
Answer: 0.066.
4. For the
Again, which distribution applies here:
Notice that in this question, we are not asked about an individual amount, but rather an average amount of daily coffee consumption for the sample of
Click on Edit Parameters and enter your MEAN and STANDARD DEVIATION for the random variable
Select both Right Tail and Left Tail checkboxes at the top left. Click on the blue box below the left tail and change that to 15.5 and hit enter. Next, click on the blue box below the right tail and change that to 16.5 and hit enter. We’re looking for the area in between. Answer: 0.276
5. YES. The assumption that the population distribution is normal necessary for the CLT to work (that is for the sampling distribution of sample means to be approximately normal) here because the sample size is less than
6. …IQR for the average of
Once again the applicable distribution to use is that of
For IQR we need
Deselect all checkboxes except the Left Tail checkbox, or make sure Left Tail is selected. Click on the blue box representing area (in the middle of the graph) and change the value to 0.25 if it is not so already. When you hit enter, the value below the number line will be your cut off point.
For
Answer: IQR = 16.954 – 15.046 = 1.908
CLT for Sample Proportions
In our earlier example with the coronavirus, if instead we were interested in knowing the proportion of people who do not believe that Covid-19 has caused a public health crisis
The sampling distribution of the sample proportion is approximately normal distribution with a mean of
Review & Practice
Example: Sampling distribution of a sample proportion
Please complete the following practice exercises:
EXAMPLE
A furniture company wants to know if the percent of their customers satisfied with their furniture purchase has decreased from last year’s rating of 91%. The company surveys 152 customers to gather feedback.
1. What is the distribution of
2. What is the probability that more than 95% of those surveyed are satisfied with their purchase?
3. What is the probability that between 88% and 92% of those surveyed are satisfied with their purchase?
4. What is the range of satisfaction percentage rating that will place customer satisfaction in the middle 95% of the sampling distribution?
SHOW SOLUTION
1. We’re comparing with last year’s ratings of 91%. So,
DESMOS CALCULATOR
Calculator Usage Guide
In the first input box on Desmos calculator, enter:
3. What is the probability that between 88% and 92% of those surveyed are satisfied with their purchase?
Change the Min to 0.88 and Max to 0.92 for the input boxes under Find Cumulative Probability (CDF). Area is shaded and the answer is displayed. Area/Probability
NOTE: Leaving Min or Max entry blank will default their values to
4. What is the range of satisfaction percentage rating that will place customer satisfaction in the middle 95% of the sampling distribution?
Unlike in the previous two parts of the question, the 95% here refers to the area in the middle of the sampling distribution. We need to find the cut off points that separate the middle 95% of the sampling distribution of sample proportions. We use inversecdf to find cutoff points. Inversecdf takes area to the left.
If the area in the middle is 95%, then the two tail areas shaded in blue must have their areas add up to 5% which means the area of each tail is 2.5%. So the left boundary cut off point for the middle 95% has 2.5% area to its left whereas the right boundary cut off has 97.5% area to its left.
![]() |
Lower(Left) cut off = Upper(Right) cut off = |
SUMMARY
Suppose
CLT for SAMPLE MEANS
- The shape of the sampling distribution of sample means will be:
- approximately normal if the sample size
is large enough (usually ) OR - normal if the random variable
is normally distributed (that is, the population is normally distributed) regardless of the sample size
- approximately normal if the sample size
- The sampling distribution of sample means
will have a mean and standard deviation (also called the standard error of the mean)
CLT for SAMPLE PROPORTIONS
- The shape of the sampling distribution of sample means will be approximately normal if
AND where .
Note that some textbooks require at least 10 for the above. - The sampling distribution of sample proportions
will have a mean and standard deviation (also called the standard error of the proportion)