Fundamentals of Statistics contains material of various lectures and courses of H. Lohninger on statistics, data analysis and chemometrics......click here for more.


Frequency

Frequencies of occurence form the basis of many statistical procedures and approaches for interpreting data, because they reflect physical, social, political, biological (or whatever) realities. When describing frequencies we have to distinguish two cases:

  • discrete values which mostly exihibit nominal or ordinal levels of measurement (categories), and
  • continuous values, which are in many cases of interval of ratio scale level.
In the case of discrete measurements the number of observed levels is often much smaller than the number of observations, while continuous values may take any number of different levels (depending only on the resolution of the measurement). An example for discrete values are the math grades of the students at school. If the school has 120 students this means that there are 120 observations at only 6 levels (i.e. grades A to F).

On the other hand, for continuous variables the number of different measurement values may be of the same order as the number of observations. For example, the body weight of the 120 students will result most probably in 120 different values if we determine the weight with a precision of one gram, since the chance to find two persons having exactly the same weight (down to the gram level) will be very low. In this case we will have to assign categories (classes) to the weights simply by specifying ranges along the weight scale. The counts of observations falling into these classes are then the frequencies.(1)

Now, what do we mean by frequency?

The absolute frequency ni is the number of observations belonging to a category ai or falling into a particular class ci. The sum of all frequencies of all categories/classes is equal to N, the total number of observations:

Σni = N

Relative freqencies fi are obtained by normalizing the individual frequencies to a total sum of 1.0 (or 100%, respectively). This way the frequencies become independent of the sample size and will be comparable to each other.

Frequencies are usually delineated in a frequency table or displayed as a histogram. The frequency table contains the absolute and the relative frequencies ni and fi for all categories ai:

a1  n1  f1
a2  n2  f2
a3  n3  f3
..  ..  ..

Example: 28 persons have been asked for their eye colors, resulting in the following frequencies:

eye color abs. frequency rel. frequency
brown 14 0.500 (50%)
gray 2 0.071 (7.1%)
blue 9 0.321 (32.1%)
green 3 0.107 (10.7%)

The dataset with 28 observations and one variable exihibts four categories which differ in their frequencies.



(1) If we do not classify continuous variables the relative frequencies would be all equal to 1/n in the case of the number of categories equals the number of observations. This would effectively prevent us from making any statement on the distribution of the data.