Fundamentals of Statistics contains material of various lectures and courses of H. Lohninger on statistics, data analysis and chemometrics......click here for more.


Combination of Several Distributions

What happens if the distributions of several samples are considered as individual distributions at one time, and as a cumulative sample (combining all observations of the various samples into one big sample), at another time? Is the distribution of the combined sample different to the individual distributions? If Yes, how do they differ? The answer to this question is important for the understanding of the analysis of variance (ANOVA).

First let's have a look at a concrete example: suppose, we have four normally distributed samples, of which we only know that the variances are equal (but we don't know the sample sizes). Further we don't know anything about the means of the four samples. What about the distribution which arises if we combine all objects of these four samples into a single (larger) sample?

For further discussion we have to discriminate two cases: (1) the means of the individual samples are equal, and (2) the means of the individual samples differ from each other (to be specific, the mean of at least one sample is different to the other means).

The distribution of the combined sample (black) is exactly equal to the distribution of individual samples (the five sample distributions overlap perfectly).
Case 1: Equal Means of the Individual Samples

If the means of the individual samples are equal (according to our assumptions, the variances are equal anyway), the parameters of the combined distributions are the same as the parameters of the idividual ones. The variance of the combined sample is equal to the variances of the individual samples.

The distribution of the combined sample (black) is considerably broader than the distribution of any of the individual samples. This increase of variance is the basis of the ANOVA (analysis of variance).
Case 2: Different Means of the Individual Samples

If we combine the individual samples (whose means are not equal) into a larger collective sample the variance of the new sample will be greater than the variance of the individual samples (which is the same for all of them).

This increase in variance of the combined sample gives us now the opportunity to test for the equality of the means of the individual samples, simply by comparing the variances of the individual samples to the variance of the collective sample. Remember, if and only if the means are equal the variance of the collective sample will be the same as the variance of the individual samples. This principle - the testing for the equality of means by investigating the collective variance - is the basis of the ANOVA