A procedure for finding the best fitting line: mean prediction error

One way of answering this question of finding the best fitting line is to see how close the predicted weight (Y') and the actual weight (Y) are. In terms of our table, we want to know about the difference Y-Y'.

So we add another column to our table of a line with m=1.

 Height, Weight Predicted X Y Weight, Y' Y - Y' 61 140 163 -23 64 141 166 -25 64 144 166 -22 66 158 168 -10 67 156 169 -13 67 174 169 5 68 160 170 -10 68 164 170 -6 68 170 170 0 69 172 171 1 70 170 172 -2 71 175 173 2 72 170 174 -4 72 174 174 0 73 176 175 1 74 180 176 4 75 192 177 15 Median: 68 170 Mean: 170.76 -5.1

slope = 1     int = 102     equation is Y'= 1.0 * X + 102

Let's make sure that we can interpret this new column.

For example, for the first player, his actual weight is 140 lb. and his predicted weight is 163 lb.. So we have Y-Y' = 140-163 = -23 lb.. That is, the actual weight is 23lb. less than predicted.

Now look at the second to last player.

We have Y-Y' = 180 - 176 = 4 lb.. The actual weight is 4 lb. more than predicted.

How can we summarize how well the line fits the data ?

A reasonable way is to add up all of the Y-Y'.

Let's do that.

Notice that all of the prediction errors have a mean of -5.1 (for a line with a slope of 1.).

Let' see what happens for another line. In the previous section we found the equation of a line with m = 2 to be Y'= 2.0 * X + 34.

Here is the table for predicted weights for this equation.

 Height, Weight Predicted X Y Weight, Y' Y-Y' 61 140 156 -16 64 141 162 -21 64 144 162 -16 66 158 166 -8 67 156 168 -12 67 174 168 6 68 160 170 -10 68 164 170 -6 68 170 170 0 69 172 172 0 70 170 174 -4 71 175 176 -1 72 170 178 -8 72 174 178 -4 73 176 180 -4 74 180 182 -2 75 192 184 8 Median: 68 170 Mean: 171.5 -5.8

slope = 2     int = 34     equation is Y'= 2.0 * X + 34

Notice that the prediction error has a mean of -5.8 (for a line with a slope of 2). Can you think of a reason why adding the prediction errors might not be the best way to judge how well the line fits the data?

Exercises:

Let' see what happens for other lines. For each slope given below, use the spreadsheet to find sum of the prediction errors.

Use m = -1; m = 0; m = +1.0; m= +2.0; m= +3.0; m= +3.5; m=+4.0

Crickets, anyone

Create a column of prediction errors for the cricket data. Again, use the slopes you have previously chosen to find the mean prediction error using the cricket data.

Continue to the next section: The Absolute Value of the Error Terms.