![]() |
![]() |
![]() |
|
|
|
|
Welcome to the ARPE Mini-Module! This activity is a day lesson that explores real data through graphing the scatterplot and determining the regression line. Please contact us both via email.
We encourage you to look at the full module as time allows.
First, you will need to Download the
excel file. The data are about cars repaired at the Technology Center
of DuPage.
The X values represent the age of the car and the Y
values represent the mileage.
Sketch the scatterplot of the dataset. When one of the variables
influences or helps explain the other, always plot this, called the
independent variable, on the horizontal axis (the X-axis) of a
scatterplot. Notice car age helps to explain car mileage so it is the
independent variable and plotted along the horizontal axis. Car
mileage is the dependent variable and plotted on the vertical axis.
What does that mean? Give another that has dependent and independent
variables.
Your scatterplot should look similar to the graph shown on the Excel
file. What does one point on the scatterplot represent? All of the
points demonstate the relationship of one variable with another. In
other words the graph shows how car age helps to determine car
mileage. What would you expect the relationship of car age to mileage
to be, a positive or negative association? Explain.
The regression line explains how the dependent variable Y (car
mileage) changes with independent variable X (car age). This
line is often used to predict the value for Y for a given
X. Each point changes the regression equation. Let's say a car
came into the TCD to be repaired. The car is 3 years old with
40,000 miles. Add this point to the dataset and note the change in
the regression line. Add a few more cars and explain what happens to
the regression line as data are added.
How good of a prediction can you make for y based on the value of
X? r2 is a measure of how successful the
regression equation is in explaining the dependent variable.
r2 ranges from 0 to 1. For instance, from this
dataset (before entering new values) r2 = 0.1205.
This means that approximately 12% of the dependent variable, car
mileage, is explained by the age of the car. There are other factors,
or variables, that contribute the other 88% explanation of the car's
mileage. What are some variables that you can think of that would
help explain a car's mileage? Notice that as you add car's repaired
the r2 value changes. Try to make the
r2 as high and as low as possible. Explain what you
had to do to accomplish both goals.
What other information could you ask the technicians at TCD to
collect in order to help explain the car mileage further? Detail
another dataset that you could collect, graph the scatterplot, and
regression line. What would be the dependent and independent
variables? How would you collect the data? Do you expect
the relationship to be positively or negatively associated?