Analyzing Your Curve


We need to find a method of determining how well curves fit the data. Statistically, we can show this by finding the mean error. The mean error is the average distance between the actual data points and the predicted values from our equation. So to find the mean error we need to calculate the predicted population for each of year data points from our equation. To do this we just plug into our equation our years data set.

For example, for year = 1, the predicted value of population is,

y = 8.164 * e^(0.24 * 1) = 10.379

For year 5, the predicted value from our equation is,

y = 8.164 * e^(0.24 * 5) = 27.107

In the spreadsheet, do this for the years data set under column F.

Take a look at a plot of the original data points (black) and the predicted values from your equation (green).

The red lines are the distance from the curve you found to the actual data points. Our goal is to find the curve that minimizes the distances of these lines.

Next, we find the error between the equation's predicted population and the actual population. This is just the difference between the two values. In the spreadsheet, under column G, find the error (column F - column C) for the current equation. Since we are only concerned with the actual error and not the sign, or whether its greater or less than the actual value, take the absolute value of the error. In the spreadsheet, do this in column H.
Note: The command for the absolute value is ABS().

One way to judge how well the curve fits the data is to analyze the absolute error by finding the mean of the error. The smaller the mean error, the better the equation. Find the mean error in the spreadsheet in cell H24. Another way to anaylze the curve is to find the mean square error. Instead finding the mean of the error, we find the mean of squaring the error. In column J, find the square of each error. Then in cell J24, find the mean of the square errors.

Again, the smaller the mean square error, the better the equation.

The main difference between mean error and mean square error is that the mean square error take more of an account for data values that are farther away from the prediction values. In other words, data that falls far from its predictor has a larger effect on the mean square error than the mean error. This is because when a two number are squared, their difference becomes greater then if they were not squared. Statisticians generally use the square mean error in analyses, so we will too.

Two questions to consider!

Do you think it is possible to find a curve that has a lower mean square error?

If we could find one, would this indicate a better fitting equation and thus a more reliable predictor of the population of the United States?

In the next section, we will explore a way to find the best-fitting curve to the data.


Exercises

Use the Polonium spreadsheet to answer the following questions.

1.)For the equation you found in the previous section, find the mean error and mean square error.

2.)Which data points contribute the most to your error?

3.) In your own words describe how well you think this equation fits the data.

Use the Dow Jones spreadsheet to answer the following questions.

4.)For the Dow Jones spreadsheet you analyzed earlier, find an equation to represent the data, its mean error and mean square error.

5.) Find the Dow Jones Average for the year 1996 according to the above equation.

6.) Click here to find the true value of the Dow Jones Average for today. How well did this equation predict this? (remember to click on Back in Netscape to return to this page.)

7.) If someone said they were investing in some of the stocks listed in the Dow Jones Average and they were predicting the average based on the equation above, do you think they would be happy or upset based on the trends in this graph? In other words, did the average rise like they expected it to?



Return to title page
Go to the next section - THE BEST FITTING CURVE