Statistical Terms





Correlation Coefficient: a statistic used to describe the strength of the relationship between two variables. It is a number between -1 and 1 (inclusive) that measures how closely a set of data points tend to cluster about the regression line. If the correlation coefficient is close to +1, then the variables have a strong positive relationship. If it is close to -1, then there exists a strong negative relationship. If it is near 0, then little or no relationship exists.

One way to calculate the correlation coefficient (r) is to divide the covariance of X and Y by the product of the standard deviation of X and the standard deviation of Y.

Another way to calculate the correlation coefficient (r) is to multiply the slope of the regression line by the standard deviation of X and then divide by the standard deviation of Y.














Covariance: a measure of how much two variables change with respect to one another.

It can be calculated by averaging the sum of the products of the deviation scores:
… (Xi - X mean)*(Yi - Y mean) divided by the number of data.














Mean: a statistical average found by dividing the sum of a set of data by the number of items of data.














Outlier: a point in a dataset that is far removed from the rest of the points. An outlier can influence the regression equation by giving too much weight to a single point.














Regression Equation: the equation of the best-fitting line through a set of data.

It describes the relationship between two variables and is in the form: Y' = m*X + b. Since the line goes through the mean data point, X mean and Y mean will always be a solution to the regression equation.














Slope: a number measuring the steepness of a line relative to the x-axis.

The slope of a line is usually calculated by dividing the amount of change in Y by the amount of change in X. The slope of the regression line can be calculated by dividing the covariance of X and Y by the variance of X.














Standard Deviation: the positive square root of the variance.














Variance: a statistic used to describe the spread of data about the mean.

It can be calculated by averaging the sum of the squares of the deviations from X mean:
…(Xi - X mean)^2 divided by the number of data.

A quicker method if you are calculating the variance by hand is to take the mean of the squares minus the square of the mean {the mean of all the X's squared minus the square of the X mean}:
(… Xi ^ 2 divided by number of data) minus (the square of X mean).














Y-intercept: the point at which a line crosses the Y-axis.

The Y-intercept of a regression line can be calculated by subtracting the product of the slope and X mean from the Y mean: b = Y mean - m * X mean.