UNIT S1 STUDY GUIDE
Bootstrapping
In constructing a confidence interval in Unit 4, we were able to invoke CLT to understand the sampling distribution of the sample statistic (sample means or sample proportion), but that required some assumptions about the distribution of the population or required larger sample sizes for the sampling distribution to take the familiar bell shaped curve. However, if population information is not known especially for small sample sizes we would reach a dead end. With Bootstrapping, we can create a sampling distribution just by using the one sample we have. The process is fairly straight forward as follows:
- STEP 1: Make many, many, many copies of your sample data– This collection of copies is the simulated population
- STEP 2: Take a sample of size n from the simulated population, compute the sample statistic (sample mean or sample proportion) from this sample
This sample is called a Bootstrap sample and its sample statistic is Bootstrap statistic - STEPS 1 and 2 essentially boil down to randomly selecting n values (the same size as the original sample) with replacement from the original sample. So instead of making multiple copies as in STEP 1, we’ll simply sample a new sample of size n with replacement from the original sample.
What is Sampling With and Without Replacement ? - STEP 3: Repeat STEP 2 many many many times, each time obtaining a new Bootstrap statistic from your Bootstrap sample. These repeated resamples are Bootstrap replicates.
If we make a graph (histogram/dotplot) of all of the Bootstrap statistics from STEP 3, this graph will be the Bootstrap sampling distribution, which is a pretty good approximation of the actual sampling distribution of the sample statistic if we have sufficiently large number of Bootstrap samples. Using mean of the Bootstrap statistics from all of those Bootstrap samples will be the Bootstrap mean and the standard deviation of the Bootstrap samples is the Bootstrap Standard Error. These results can be used to compute confidence intervals and more.
BOOTSTRAPPING Explained
Methods for Computing Confidence Intervals
Percentile Method
Follow the following steps to compute a bootstrap confidence interval using the Percentile method in StatKey:
- Use Edit Data button to enter your data
- Click Generate 1,000 Samples a few times to generate several thousand bootstrap samples (say, 10K samples).
- Select “Two-Tail” and click on the middle box to change the default confidence level from 0.95 to your desired confidence level.
- The lower and upper bounds of the confidence interval are shown below the horizontal axis on the main graph. The endpoints are the lower and upper percentiles (e.g., 2.5th and 97.5th for a 95% CI).
Plug-in Method
In the Plug-in method, you use the bootstrap distribution’s standard deviation as the estimated standard error and then apply the statistic ± (critical value × SE) formula to determine the confidence interval.
- Use Edit Data button to enter your data
- Click Generate 1,000 Samples a few times to generate several thousand bootstrap samples (say, 10K samples).
- Confidence Interval is given by \[\text{statistic} \pm \text{critical value} \times \text{SE}\] where
- Statistic = the original sample statistic (e.g., mean or proportion)
- SE = standard error from the bootstrap distribution
- Critical Value = critical value from the standard normal distribution (for proportions) and t-distribution (for means). We’ll cover t-distributions in Unit 5.
Note: Some resources use z-critical values from the standard normal distribution when constructing confidence intervals. For example, for a 95% confidence interval, the critical value is typically 1.96 (often rounded to 2 for simplicity).
Finding Critical Values
Use Desmos or StatKey. (Note: If you need to use more than 3 decimal places for the critical value, you may want to skip StatKey)
Critical z value using Desmos | Critical z-value using StatKey (no audio)
Critical t-value using Desmos | Critical t-value using StatKey (no audio)
Example: PERCENTILE and Standard Error PLUG-IN METHODS
Example: Proportion of Lactose Intolerant German Adults
BOOTSTRAP CONFIDENCE INTERVALS – PRACTICE