Variance - Excel

In order to do this section, download this Excel file called Baseball.

Recall in the introduction of this lesson, the question of whether the perception in America that all baseball players are overpaid was raised. The Excel file you downloaded contains the salaries for two major league teams, the Chicago Cubs and the Montreal Expos in 1994. The salaries have been divided by a million in order to make the numbers easier to work with. For example, Ryne Sandberg (cell A5), has a salary listed as 5.975. This actually means \$5,975,000.

We will use this data to help us explore the answer to the question above.

1. One statistic used frequently is called the range. The range of a data set is the difference between the greatest and least values in the set.

1. Who are the Cubs players that are paid the highest and lowest?

2. What is the range of salaries for the Cubs?

2. In Excel you can do often tedious calculations quickly and accurately.

1. What is the total payroll for the Cubs salaries? (CELL A33)

2. What is the mean salary for the Cubs? (CELL A34)

3. What do you think Willie Banks (CELL A24) would say if you told him the answer you found for the mean salary?

4. Is the Cubs mean salary above or below the mean for all the teams? (Remember, this was \$1.2 million dollars.)

3. In column C,

1. Find the deviation (COLUMN C) of each players salary from the mean. (Recall, the deviation is the difference between the mean and the data point.)

2. Without doing any calculation, what would the mean of this list be?

3. In your own words, describe how Mark Grace's salary differs from the mean salary for the Cubs.

4. Now, describe Rey Sanchez's salary.

4. In column D,

1. Find the absolute value of the deviations. (COLUMN D) (Hint: In Excel, the command for absolute value is ABS().)

2. Find the sum and mean of this list. (CELLS D33 and D34)

3. In your own words, explain what this number means.

1. Calculate the square of each deviation. (COLUMN E)

2. Find the sum and mean of the square of the deviations (or variance). (CELLS E33 and E34)

5. A statistic that is commonly used is called the standard deviation. It is calculated by taking the square root of the variance. In a sense, it is an undoing of the squaring that we did to find the variance. The standard deviation therefore is really a sort of "average distance" of each point from the mean.

1. Calculate the standard deviation for the Cubs player salaries.

2. In your own words, explain what this number means in relation to the Cub salaries.

6. Now, for the Montreal Expos player salaries, find

1. The sum and mean of the salaries. (CELLS G33 and G34)

2. Each salaries deviation from the mean. (COLUMN H)

3. The absolute value of the deviations. (COLUMN I)

4. The sum and mean of the absolute value of the deviations. (CELLS I33 and I34)

5. The variance and standard deviation. (COLUMN J)

Use the data that you calculated about the Cubs and Expos salaries to answer the following questions. All answers should be defended using your the data and your calculations.

7. Which teams' players on average make more money?

8. Which team has a greater spread in their salaries?

9. Which teams' salaries are more centered around the mean of their salaries?

10. For what percent of the variance do Expo players Larry Walker and Marquis Grissom account?

11. An outlier is a point that is significantly different from the rest of the data set. For example, if a data set contained, {1,2,3,4,40}, 40 would be an outlier since it is significantly greater than the other points.

Would you consider Larry Walker's salary to be an outlier? How about Marquis Grissom's?

12. If the owner of the Cubs decided to give everyone on the team a \$500,000 raise, what would happen to the mean? to the variance? to the standard deviation?

13. After analyzing the data, do you think baseball players are overpaid?

EXTRA CREDIT

In the middle of the 1994 season, Ryne Sandberg retired. Since he retired he did not get to keep most of his salary. Answer the following questions by deleting Ryne Sandberg's salary from the Cubs' list.

14. Calculate the mean, variance and standard deviation for list minus Ryne Sandberg.

1. How much lower was the mean without Ryne Sandberg than with him?

2. How about the variance and standard deviation?

3. If Glenallen Hill (who makes \$1,000,000) decided to retire instead of Ryne Sandberg, what would happen to the mean? Would the difference be greater than or less than if Ryne Sandberg retired? Answer the same questions for the variance?