Chi-Square


Calculating D for a set of outcomes of an experiment is a convenient way of telling how far the results differed from what was expected. However, a better (and usual) way of describing the results is to calculate a value called chi-square. For the rest of this lesson when you see c2 it will mean chi-square.

Let's see how chi-squared is computed and compare that with the way D is computed.

c2 = Sum[ (Expected - Obtained)^2]/Expected
D = Sum[ Abs(Expected - Obtained)]/Expected

Why do you think that the c2 is a more powerful statistic?

c2 is a more powerful measure of difference from an expectation because by squaring the difference it gives larger differences more weight in the final number.

EX. 5 is only a little bigger than 2, but 5^2=25 is much larger than 2^2=4.

So large differences which shouldn't happen as often in a fair die, are amplified, while small differences, which are very probable even in a fair die, are minimized.

Look at the following table to see how you compute c2 from data from 60 die rolls.



Now you should ask the same sort of question you asked involving the D statistic: How often will the expected and observed outcomes differ enough to produce c2 values as large as or larger than 4.0, by chance, for a six-sided die?

To answer that question, you repeat the experiment of rolling a die many times, computing c2 for each set of 60 rolls. You then prepare a table of c2 values. So you don't have to sit around rolling die, this spreadsheet will do the job for you. ( Instructions)

Example of the spreadsheet in action.


Again, to read the table you must recall the table from the D statistic. The first number in the table is the number of c2's that fall between 0 and 1, the second number in the table is the number of c2's that fall between 1 and 2, 1 inclusively. Look at the following graph. If your c2 is in the ranges with the highest ten values of the table then it is statistically significant and your die is probably unfair, otherwise it is good.

Similiar to the D statistics graph if your c2 falls to the far right of the graph then your die is bogus, if it is in the middle or to the left it is fair.

BIG QUESTION 1

In all of the examples of this lesson there your calculations of c2 have been limited to certain circumstances. What were they?

BIG ANSWER 1

If you thought about it for a second, you would realize that the only examples you were given were about dice. Yes, they were, but c2 can be generalized to any situation, so can the programs for the tables, graphs, and values of c2 itself. Also, almost always the die was tossed 60 times, so 10 was the expected value for each number. You will now see how to put c2 into use with any set of data.


BIG QUESTION 2

What is the c2 of the data from the KEY PROBLEM at the beginning of the chapter?

BIG ANSWER 2

Okay first, the data from the Key Problem again.

Using this data and the example table at the beginning of this page try to figure out the c2 of the Accidents at Irongate problem.

After you have given it a shot look at the keyprobanswer
.