Covariance - GSP

In order to do this section, download this Sketchpad file called Covariance.

To download GSP files, you must configure your browser (Netscape) to recognize these files. If your browser has not been set up for Sketchpad files yet, click here to learn how.


Note: If you are familiar with Sketchpad, continue with the worksheet below. If not, check out this short Sketchpad Tutorial that will teach you enough about Sketchpad to do this lesson.
The Skecthpad file has five points, P1, P2, P3, P4 and P5. We will use these points to represent different data points. For example, if your data point had one value 2, and the other 5, you could plot this at an x-coordinate of 2 and a y-coordinate of 5, or at (2,5). You can move these points anywhere to change either or both of the x and y coordinates of the data point. The points' coordinates are shown in the upper right hand corner so you can place the point where you need to. The dashed lines represent the mean of the x and y values for the five data points. Move some of the points and watch what happens to the mean. Their value are also shown in the upper right hand corner. Ignore the squares for now.

  1. The mean data point is the point whose coordinates are (Xmean and Ymean). In other words, it would be the point with the mean of the x values as its x coordinate and the mean of the values as the y coordinate.

    1. Does the mean data point always have to be an actual data point? Explain.

    2. Place the points, P1 through P5 at (1,2), (2,1), (3,3), (4,5) and (5,4). What is the mean for the x data points? the y data points?

    3. Now, place the points at (1,4), (3,2), (2,2), (3,4) and (6,3)? What is the mean data point?

    4. Move each point to a new position so as the points have the same means as above. What are the coordinates for your points?

    5. How many possible positions are there for the points to have the same means as before?

  2. Previously, you learned about the variance of a data set. Recall that the variance is a measure of the amount that a set of data varies about its mean. Covariance, on the other hand, is a measure of how two data sets vary with respect to each other.

    Place the points at (0,1), (1,3), (1,4), (3,5) and (4,6).

    1. If you were to draw a straight line that would best fit this data, what would the sign (+,- or 0) of the slope of this line be?

    2. Measure the distance from the point (3,5) to the mean data point (not, vice versa!) in the x-(or horizontal) direction.

    3. Similarly, measure the distance from (3,5) to the mean data point in the y-(or vertical) direction. Again, (not, vice versa!)

      Similar to variance, we can measure the area of the rectangle formed from this point to the mean data point.

    4. Measure the area of the rectangle formed from point (3,5) to the mean data point.

      The area of this rectangle is the contribution to the covariance for this point!

  3. Now, let's investigate what the biggest difference is between covariance and variance.

    1. Measure the distance from point (1,3) in the x and y directions to the mean data point.

    2. What is the sign (+,- or 0) of these distances?

    3. Measure the area of the rectangle formed from this point to the mean data point?

      The contribution to the covariance is the area represented by the rectangle, but what we really are doing is multiplying the x and y deviations from their respected mean. These deviations can be negative, so when we multiply them, we can get a positive or negative number. The area of the rectangle tells how large the points' contributions to the covariance but we still have to decide if it's positive or negative.

    4. One way to look at this is to look at the lines formed from plotting the means (the dashed lines in Sketchpad). The graph is then broken into 4 quadrants. Answer the following questions with (+, -, or 0).

      1. In the upper right hand quadrant, the x deviations are what sign?
        And, the y deviation?
        Therefore, the contributions to the covariance is what sign?

      2. In the upper left hand quadrant, the x deviations is what sign?
        And, the y deviation?
        Therefore, the contributions to the covariance is what sign?

      3. In the lower left hand quadrant, the x deviations is what sign?
        And, the y deviation?
        Therefore, the contribution to the covariance is what sign?

      4. In the lower right hand quadrant, the x deviations is what sign?
        And, the y deviation?
        Therefore, the contribution to the covariance is what sign?

    5. Find the x and y deviations for the other three points (0,1), (1,4) and (4,6).

    6. For these points, find the contribution and the correct sign to the covariance (the area of the rectangle).

  4. Similar to variance, we sum each of the points contributions to the covariance and divide by the number of points to find the total covariance.

    1. Find the total covariance for the points in problem 2. Show your work.
      Hint: Sum the areas of the rectangles, (watch the signs!) and divide by how many data points there are.

    2. Check your answer with the square labeled "Covariance".

    1. Place the points at (1,0), (2,1), (4,3), (4,4), (5,5). What is the sign (+,- or 0) of each points contribution to the covariance?

    2. Place the points at (2,3), (1,5), (3,0), (2,6), (4,1). What is the sign (+,- or 0) of each points contribution to the covariance?

  5. Place the points at (1,4), (1,5), (3,4), (4,3) ,(5,1). Picture in your mind, a line that would pass through the data in such a way that it would pass through or near as many lines as possible.

    1. What is the sign of the slope of your line?

    2. What is the sign of each points contributions to the covariance?

    3. What is the sign of the total covariance? (You shouldn't need to do any calculations).

      Move the points around so the sign of the slope of the best-fitting line is the same as you thought above and note the sign of the covariance.

      Now place the points at, (0,1), (2,2), (4,3), (5,4) ,(6,4) and again, picture in your mind, a line that would pass through the data in such a way that it would pass through or near as many lines as possible.

    4. What is the sign of the slope of your line?

    5. What is the sign of each points contributions to the covariance?

    6. What is the sign of the total covariance? (You shouldn't need to do any calculations).

      Again, move the points to different positions so the sign of the slope of the best-fitting line is the same as you thought above.

    7. Conjecture about the relation between the sign of the slope of the best-fitting line and the sign of the covariance.

  6. If the data points fell on a straight horizontal line,

    1. What would the covariance be?
      Check your answer by moving the points in such a position.

    2. Explain your answer in geometric terms.
      Hint: What happens to the areas of the rectangles?

  7. Place the points at (1,1), (2,1), (3,3), (4,4), (5,4).

    1. Find the covariance of this data set?

    2. Now, add 1 to each points coordinates (i.e. (2,2), (3,2), (4,4), (5,5) and (6,5)).
      What's the covariance?

    3. Now subtract 1 from each point (i.e. (0,0), (1,0), (2,2), (3,3) and (4,3).
      What's the covariance?

    4. Conjecture on what happens to the covariance when a constant is added to each data point's x and y coordinates?


Return to Variance Title Page.

Return to Covariance Title Page.

Go to next section -> Covariance - Excel.