Sampling


It would be difficult to obtain the mileage rating for every American car manufactured so what we can do is get a sample of some American cars and infer from those the mean for all cars. Specifically, we want to find a range for the average mileage so that we minimize the probability that the true mean does not fall within this range. Here is the data for 15 randomly selected 1993 models of American cars.

Manufacutrer Model Mileage
Buick Century 22
Buick Riveria 19
Cadillac Seville 16
Chevy Camaro 19
Chevy Cavalier 25
Chrysler LeBaron 23
Dodge Colt 29
Ford Escort 23
Ford Mustang 22
Geo Metro 46
Lincoln TownCar 18
Oldsmobile Achieva 24
PontiacSunbird 23
Pontiac Bonneville 19
Pontiac Firebird 23

Since we do not have any idea about all American cars (the POPULATION), we will use our data (the SAMPLE) to try to determine the true average mileage.

To use the Bootstrap method one key assumption is made:
We assume that the population is made of cars like the ones we have in our data. That is, we can represent the population of American cars with 1 million Buick Centuries, 1000 Ford Escorts, etc...
Since it would be impossible to create such a universe, we can select a new random sample from our original data and use this to help determine the true mean of the mileage for American cars.

Use the spreadsheet, Day 1, to further explore this problem and answer the following questions.

  1. In your own words, state the main assumption that is made.
  2. In the spreadsheet, click on the button "Randomly select one data point". What is the probability of getting 16?
  3. When we sample from our data set, we sample with replacement. This means that the we have just as likely chance of selecting the same data point in our second sample as we do in the first. Why do you think we sample with replacement instead of sampling without replacement?
  4. Click the same button 15 times. This will be one trial. What is the mean of your trial and how does it compare to the actual mean of the data?
  5. Let's keep track of the trials. Click on the "Enter Trial" button. They'll be entered in column F in the spreadsheet. Next, click the "Erase Samples" button. Now, take another fifteen samples. What is the mean and how does it compare to the trial you previously did? How about to the actual mean?
  6. Do at least ten trials. Next, click on the button "Sort Trials From Largest To Smallest". What is the range of the trials you found? How much "variability" is there in the trials?
  7. Based on these ten trials, what do you think this interval says about the "true" mean of the population of the mileage of American cars?
  8. What is one way to get an interval that you can be more confident in about the mileage mean?
  9. Click on the "Erase Trials" button. Again, find ten trials from the data and sort it from largest to smallest. How does this interval compare to the one you found previously?

Continue to next section.

Return to Introduction.