A procedure for finding the best fitting line: mean prediction error


One way of answering this question of finding the best fitting line is to see how close the predicted weight (Y') and the actual weight (Y) are. In terms of our table, we want to know about the difference Y-Y'.

So we add another column to our table of a line with m=1.

    Table 4
Height, WeightPredicted
X YWeight, Y'Y - Y'
61140163-23
64141166-25
64144166-22
66158168-10
67156169-13
67174169 5
68160170 -10
68164170 -6
681701700
691721711
70170172-2
711751732
72170174 -4
721741740
731761751
741801764
75192177 15
Median: 68170
Mean: 170.76-5.1

slope = 1     int = 102     equation is Y'= 1.0 * X + 102

Let's make sure that we can interpret this new column.

For example, for the first player, his actual weight is 140 lb. and his predicted weight is 163 lb.. So we have Y-Y' = 140-163 = -23 lb.. That is, the actual weight is 23lb. less than predicted.

Now look at the second to last player.

We have Y-Y' = 180 - 176 = 4 lb.. The actual weight is 4 lb. more than predicted.

How can we summarize how well the line fits the data ?

A reasonable way is to add up all of the Y-Y'.

Let's do that.

Notice that all of the prediction errors have a mean of -5.1 (for a line with a slope of 1.).


Let' see what happens for another line. In the previous section we found the equation of a line with m = 2 to be Y'= 2.0 * X + 34.

Here is the table for predicted weights for this equation.

    Table 5
Height, WeightPredicted
X YWeight, Y' Y-Y'
61140156-16
64141162 -21
64144162 -16
66158166-8
67156168 -12
67174168 6
68160170-10
68164170 -6
681701700
69172172 0
70170174-4
71175176-1
72170178 -8
72174178-4
73176180-4
74180182-2
751921848
Median: 68170
Mean: 171.5-5.8

slope = 2     int = 34     equation is Y'= 2.0 * X + 34

Notice that the prediction error has a mean of -5.8 (for a line with a slope of 2). Can you think of a reason why adding the prediction errors might not be the best way to judge how well the line fits the data?


Exercises:

Let' see what happens for other lines. For each slope given below, use the spreadsheet to find sum of the prediction errors.

Use m = -1; m = 0; m = +1.0; m= +2.0; m= +3.0; m= +3.5; m=+4.0


Crickets, anyone

Create a column of prediction errors for the cricket data. Again, use the slopes you have previously chosen to find the mean prediction error using the cricket data.


Continue to the next section: The Absolute Value of the Error Terms.