Accuracy, Precision, Variation and Tolerance Module

 ●  Teacher Pages ●  Student Pages ●  Contact Information

Day 5 Lesson: Variation, Variance and Standard Deviation

People in a variety fields--business, medicine, technology, etc.--must deal with data on a regular basis, and one important characteristic of any set of data is the degree to which the data is "spread out." In other words, it is often very important to know how much variation exists within a set of data. Two common tools, or indicators, that can give us an idea of just how much variation we're dealing with in our data are variance and standard deviation. They both, of course, have formulae by which they can be calculated, but let's not worry about that right now. Rather, let's just try to get a feel for what information these numbers provide by experimenting a little on a spreadsheet and letting it take care of the complicated arithmetic.

To simulate some data, we'll let Excel generate some random numbers for us with the RANDBETWEEN function.
(Instructor: If this function is not already available on your copy of Excel, the Analysis ToolPak add-in macro must be installed. You will find the add-ins under the Tools menu.)

In cell A1 of a worksheet, enter

= RANDBETWEEN(20,30)

Then fill down to cell A15 to simulate fifteen measurements in the range of 20 to 30 units. This will give us 15 random numbers between 20 and 30 inclusive. Note: Random numbers in Excel, being random, will change every time the worksheet is updated. To prevent our data from changing as we analyze it, we can "lock them in" by copying the data and then choosing Paste Special from the Edit menu and selecting Values.

In cell in A17 enter the formula

=VAR(A1:A15)

In cell A18 enter the formula

=STDEV(A1: A15)

These formulae will return, respectively, the variance and standard deviation of simulated data. Now let's repeat the same steps in columns B, C, and D using progressively smaller ranges for the random numbers: Let the random numbers range from 22 to 28 in column B and from 24 to 26 in column C. Each column should contain 15 pieces of simulated data. Note: You can fill to the right the variance and standard deviation formulae rather than re-entering them each time.

1. Do the variances and standard deviations increase or decrease as you confine your random numbers to a smaller and smaller interval?
2. Find the means of each set of 15 measurements. Are they about the same? Why should this be so?
Repeat this experiment in columns D, E, and F, confining the random numbers as follows: Column D: 120-130; Column E: 122-128; Column F: 124-126.
3. About how much bigger were the means in this experiment than in the first experiment (columns A-C)?
4. Did the same pattern among variances and standard deviations hold as in the first experiment (see question 1)?
5. How did the variances and standard deviations differ from the first experiment? Explain.
6. In column A the simulated data was allowed to span from 20 to 30 units, a maximum range of 10 units, larger than the ranges in B and C. But it is possible to have a large range and still have a small variance and standard deviation. For example, consider these 15 measurements that also have a span from 20 to 30 units:
{20.0, 24.8, 24.8, 24.9, 24.9, 24.9, 25.0, 25.0, 25.0, 25.1, 25.1, 25.1, 25.2, 25.2, 30.0}
7. The mean is these data is 25.0 units. How does this mean and range compare to your mean in column A?
8. How would you guess the variation and standard deviation compare to that in column A? (If you aren't sure or want confirmation, use Excel to actually finds these values for you.)
9. If columns A, B, and C all represent separate sets of measurements of the same quantity, which set of measurements is the most precise? Why?
10. Use the square root function in Excel, SQRT to take the square roots of the variances in your spreadsheet. What do you notice?
11. Let's take another look at the data you gathered on day 3 and try to get a handle on how "spread out" the measurements were. Which team was most precise in the measurements?
 This, by the way, is an option if the students have access to a large enough sample of brake pads. Each student could measure one or more of the pads and record their data in a spreadsheet. Then the data could then be analyzed as recommended above. One possible drawback to this option is that there are, in a sense, two variables: the actual thickness of the different brake pads and the accuracy of the measurements being made. If the students are fairly proficient with the necessary measuring techniques, however, the latter error is probably negligible when compared to the scale of variation among the pads, especially if the pads came from cars of different models and ages.