Fundamentals of Statistics contains material of various lectures and courses of H. Lohninger on statistics, data analysis and here for more.

Confidence Interval

A common task in statistics is the (preferably correct) estimation of a particular parameter (e.g. the mean). While the numeric calculations of a population paramater is quite simple in most cases, the situation becomes more complicated with random samples.

If we know the population the procedure is simple and clear: the parameter has to be calculated using the appropriate formula, the accuracy of the calculated value solely depends on the accuracy of the measurements and the (floating point) precision of the calculation. However, if we deal with samples, we will see that the calculated parameter (which is normally called a statistic to indicate that it has been calculated from a sample) will vary within certain limits if we calculate it from different samples of the same population. The calculated statistic will fluctuate around the true value of the parameter - which we do not know, because we do not know the population.

The question which now arises is: what is the interval where the true value of the parameter will fall into, if we only know the calculated value of the corresponding statistic for a single given sample. This question cannot be answered with absolute certainty but only to a certain probability as the used sample may happen to exhibit a very large deviation which results in a calculated statistic being far off the true value of the parameter.

In order to deal with this problem of estimating a parameter from a sample we have to use probabilities. We calculate the corresponding statistic of the sample and we additionally specify a range around the statistic which includes the (unknown) true value of the parameter to a certain probability p. Usually p is set to 95%, but 99% and 99.9% are eqally good for most practical circumstances. This probability p is called confidence coefficient, the range around the sample statistic is called confidence interval.


Example: Let's assume that we have measured the body heights of 17 ten-year-old girls (in cm):
    124.5, 136.8, 147.6, 142.6, 154.2, 126.6,
    152.0, 145.4, 147.6, 136.1, 135.5, 137.1,
    139.7, 149.1, 142.5, 134.5, 155.9
If we calculate the mean, we arrive at 141.6 cm. The corresponding confidence interval at a confidence coefficient of 95% is calculated to be +/-4.6 cm. Thus the true mean has a probability of 95% to be located within the interval [141.6-4.6, 141.6+4.6] = [137.0, 146.2]. If we change the confidence coefficient to 99% the interval will become broader (because the probability to include the true value is higher): at a level of 99% the interval is already +/-6.4 cm. If we want to have absolute certainty about the true value of the mean, the interval becomes infinitely broad. In this case we only know that the mean is located "anywhere", which is true in 100% of all cases.