Fundamentals of Statistics contains material of various lectures and courses of H. Lohninger on statistics, data analysis and chemometrics......click here for more.


Contingency Coefficient

If we look at the contingency table of two uncorrelated nominal variables, we can calculate the frequency of a particular combination of features hij as

hik = hihk/N

In the case of a correlation of the two variables the actual frequencies Hik will deviate from the ideal uncorrelated frequencies hik. The difference Dik between ideal (uncorrelated) und actual frequencies thus calculates as

Dik = Hik - hik = Hik - hihk/N.

For uncorrelated variables the difference of frequencies will be around zero for each cell of the table. Thus the correlation of the two variables can be measured by squaring the relative differences and calculating the sum of these squares in relation to the ideal frequencies:

The resulting χ2 coefficient, however, has the disadvantage that its value depends both on the dimension of the contingency table and on the size of the sample. After eliminating the dependence on the sample size, we get Pearson's contingency coefficient C:

As this coefficient C is still depending on the dimension of the contingency table, it will be normalized so that its range extends from 0.0 to 1.0:

with mmin = min(q,p).

Hint: In contrast to the correlation coefficient the corrected contingency coefficient Ccorr does not indicate the direction of the correlation but only its strength.