The D Statistic


From the last section you should be asking the following questions. How can you measure how closely your observations fit with your expectations? Also, how can you decide whether or not a given die is fair or not? There are many steps that you need to take before you can answer those questions. First, you could measure how far off our outcomes are from what you expected them to be.


You rolled a die six hundred times. You expected to see 100 ones, twos, ... sixes. You would probably never get exactly those numbers, ever.
In the table below you see the results of your outcomes and the absolute difference from the expected 100.


You can use this kind of a total number of difference to decide the fairness of a die, but you have to consider something that may cause problems.

The larger the number of rolls, the greater the resulting total difference. This could occur even if the proportion of rolls were the same. For example, multiply the rolls by one hundred.

You should get the same proportions but the difference would be multiplied by one hundred and equal 1200.
This seems like a lot, but to be only 1200 off in 60000 rolls, that's pretty good.

To adjust for this problem you should divide the absolute difference by the number expected and then add the differences together.

DO IT


In the above example if you divided each difference by 100 you would get
1/100 + 3/100 + 1/100 + 5/100 + 2/100 + 0/100 = 12/100 = .12


Call this number D , for the standard difference between expected and obtained outcomes.

Get a D for your data from 60 rolls.

The little lines on the graph indicate how far the totals were from the expected.
D = 1/10 + 4/10 + 0/10 + 2/10 + 3/10 + 2/10 = 12/10 = 1.2

So D in this case is 1.2 while in the case with 600 rolls it is a measly 0.12. You ask, so?

All of this doesn't tell you whether or not 1 or .12 is small or large for the purpose of determining whether the die is fair or not.

If a D couldn't happen very often by chance then it is called statistically significant.

The question you should ask yourself is How large a value of D is needed before you can say that, for a fair die, such a value will be obtained very seldom by chance?

One way to answer this question is to produce many values of D and compare the value that you have with them.

Roll a die 60 times and enter the values in the function below and it will tell you what the D statistic is.

How did your D compare to the given D?
Is one D enough to make a judgment with?
No, to make a good comparison you need to produce many values of D, like 50, which would take a long time to do by hand. So you can use this nifty spreadsheet, with instructions, to generate many Ds and put them into a niftier table for you to compare your D to.

Or you could use this example that has been done for you using the spreadsheet.

  The numbers in the table are the frequencies of the randomly generated Ds in the given interval. If your D is in an interval which is less than the intervals which contain the ten highest values for D then it is not significant and your die is a fair one. If it is in the top ten or higher than it is significant and your die is bogus. Look at the following graph of the table.

If your D value is on the far right side of the graph then your die is probably not fair. If it is the middle or to the left then it is probably fair and the numbers are a random occurrence.

If you understand the D statistic you are ready to move on to Chi-Square!