"

UNIT S1 STUDY GUIDE

Bootstrapping
In constructing a confidence interval in Unit 4, we were able to invoke CLT to understand the sampling distribution of the sample statistic (sample means or sample proportion), but that required some assumptions about the distribution of the population or required larger sample sizes for the sampling distribution to take the familiar bell shaped curve. However, if population information is not known especially for small sample sizes we would reach a dead end. With Bootstrapping, we can create a sampling distribution just by using the one sample we have. The process is fairly straight forward as follows:

  • STEP 1:  Make many, many, many copies of your sample data– This collection of copies is the simulated population
  • STEP 2: Take a sample of size n from the simulated population, compute the sample statistic (sample mean or sample proportion) from this sample
    This sample is called a Bootstrap sample and its sample statistic is Bootstrap statistic
  • STEPS 1 and 2 essentially boil down to randomly selecting n values (the same size as the original sample) with replacement from the original sample. So instead of making multiple copies as in STEP 1, we’ll simply sample a new sample of size n with replacement from the original sample.
    What is Sampling With and Without Replacement ?
  • STEP 3: Repeat STEP 2 many many many times, each time obtaining a new Bootstrap statistic from your Bootstrap sample. These repeated resamples are Bootstrap replicates.

If we make a graph (histogram/dotplot) of all of the Bootstrap statistics from STEP 3, this graph will be the Bootstrap sampling distribution, which is a pretty good approximation of the actual sampling distribution of the sample statistic if we have sufficiently large number of Bootstrap samples. Using mean of the Bootstrap statistics from all of those Bootstrap samples will be the Bootstrap mean and the standard deviation of the Bootstrap samples is the Bootstrap Standard Error. These results can be used to compute confidence intervals and more.

Example: PERCENTILE and Standard Error PLUG-IN METHODS

Example: Proportion of Lactose Intolerant German Adults

BOOTSTRAP CONFIDENCE INTERVALS – PRACTICE

License

Icon for the Creative Commons Attribution-ShareAlike 4.0 International License

Statistics Study Guide Copyright © by Ram Subedi is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where otherwise noted.