People in a variety
fields--business, medicine, technology, etc.--must deal with data on a
regular basis, and one important characteristic of any set of data is
the degree to which the data is "spread out." In other words, it is
often very important to know how much variation exists within a set of
data. Two common tools, or indicators, that can give us an idea of just
how much variation we're dealing with in our data are variance and
standard deviation. They both, of course, have formulae by which they
can be calculated, but let's not worry about that right now. Rather,
let's just try to get a feel for what information these numbers provide
by experimenting a little on a spreadsheet and letting it take care of
the complicated arithmetic.
To simulate some data, we'll let Excel generate some random
numbers for us with the RANDBETWEEN function.
(Instructor: If this function is not already available on your copy of
Excel, the Analysis ToolPak add-in macro must be installed. You will
find the add-ins under the Tools menu.)
In cell A1 of a worksheet, enter
= RANDBETWEEN(20,30)
Then fill down to cell A15 to simulate fifteen measurements in
the range of 20 to 30 units. This will give us 15 random numbers
between 20 and 30 inclusive. Note: Random numbers in Excel, being
random, will change every time the worksheet is updated. To prevent our
data from changing as we analyze it, we can "lock them in" by copying
the data and then choosing Paste Special from the Edit menu and
selecting Values.
In cell in A17 enter the formula
=VAR(A1:A15)
In cell A18 enter the formula
=STDEV(A1: A15)
These formulae will return, respectively, the variance and
standard deviation of simulated data. Now let's repeat the same steps
in columns B, C, and D using progressively smaller ranges for the
random numbers: Let the random numbers range from 22 to 28 in column B
and from 24 to 26 in column C. Each column should contain 15 pieces of
simulated data. Note: You can fill to the right the variance and
standard deviation formulae rather than re-entering them each time.
- Do the variances and standard deviations increase or
decrease as you confine your random numbers to a smaller and smaller
interval?
- Find the means of each set of 15 measurements. Are they
about the same? Why should this be so?
Repeat this experiment in columns D, E, and F, confining the random
numbers as follows: Column D: 120-130; Column E: 122-128; Column F:
124-126.
- About how much bigger were the means in this experiment
than in the first experiment (columns A-C)?
- Did the same pattern among variances and standard
deviations hold as in the first experiment (see question 1)?
- How did the variances and standard deviations differ from
the first experiment? Explain.
- In column A the simulated data was allowed to span from 20
to 30 units, a maximum range of 10 units, larger than the ranges in B
and C. But it is possible to have a large range and still have a small
variance and standard deviation. For example, consider these 15
measurements that also have a span from 20 to 30 units:
{20.0, 24.8, 24.8, 24.9, 24.9, 24.9, 25.0, 25.0, 25.0, 25.1, 25.1,
25.1, 25.2, 25.2, 30.0}
- The mean is these data is 25.0 units. How does this mean
and range compare to your mean in column A?
- How would you guess the variation and standard deviation
compare to that in column A? (If you aren't sure or want confirmation,
use Excel to actually finds these values for you.)
- If columns A, B, and C all represent separate sets of
measurements of the same quantity, which set of measurements is the
most precise? Why?
- Use the square root function in Excel, SQRT to take the
square roots of the variances in your spreadsheet. What do you notice?
- Let's take another look at the data you gathered on day 3
and try to get a handle on how "spread out" the measurements were.
Which team was most precise in the measurements?
This, by the way, is an option if the students have access to a large
enough sample of brake pads. Each student could measure one or more of
the pads and record their data in a spreadsheet. Then the data could
then be analyzed as recommended above. One possible drawback to this
option is that there are, in a sense, two variables: the actual
thickness of the different brake pads and the accuracy of the
measurements being made. If the students are fairly proficient with the
necessary measuring techniques, however, the latter error is probably
negligible when compared to the scale of variation among the pads,
especially if the pads came from cars of different models and ages.