Fundamentals of Statistics contains material of various lectures and courses of H. Lohninger on statistics, data analysis and here for more.

Cluster Analysis


"Cluster Analysis" is the generic term for multivariate methods which attempt to find structures ("clusters") in the data. These methods are mostly based on calculations of the distance of the observations in multidimensional data space. Basically, cluster analysis will give answers to one of the following three questions:

  • How many classes (clusters) can be observed in a data set?
  • Which objects belong to which classes?
  • How consistent are the classes?

At right you see a plot of about 150 data of three different kinds of flowers (50 each) which clearly shows two clusters. Cluster analysis can help to find such clusters automatically.

The results of a cluster analysis are often displayed as dendrograms which show the multidimensional relationships as a two dimensional line plot. In general, cluster analysis methods can be grouped into several categories: