Fundamentals of Statistics contains material of various lectures and courses of H. Lohninger on statistics, data analysis and chemometrics......click here for more.


List of Data Sets

The following table contains a compilation of all data sets which are necessary for various exercises. Most of the data sets are real data which have been obtained from various sources (see reference section at end of this page). A few data sets are simulated data sets which have been generated with a background story in mind. The file names of the simulated data sets are indicated by the symbol in the right column. All the simulated data sets have been generated using DataLab.
 
 
Filename Description Ref.
ALCOHOL Subset of the data set WINE containing only the alcohol content of two brands.
data courtesy of M. Forina
BANANAS Properties of 38 bananas. The bananas have been weighed, measured and eaten by the author.
BENZ500 Spectroscopic data (NMR) on various brands of gasoline, and the relative octane number.
data courtesy of R. Meusinger
BODYFAT Percentage of body fat, age, weight, height, and ten body circumference measurements (e.g., abdomen) are recorded for 252 men.  Body fat, a measure of health, is estimated through an underwater weighing technique. Fitting body fat to the other measurements using multiple regression provides a convenient way of estimating body fat for men using only a scale and a measuring tape.
data courtesy of  Garth Fisher
BOILPTS Boiling points and topological descriptors of 185 chemical substances.
CANCER Number of intestine cancer cases in West Germany in the period between 1955 and 1995
CIGART Artificial data set for classification, created by INSPECT. The data points are arranged in a way that only non-linear methods are able to classify the data correctly
COINS Weight of 114 coins (Austrian 1 Schilling pieces) of different age.
Coins have been collected and weighed by H. Lohninger and A. Satzinger
ETHANOL NOx concentration in the exhaust gases of an experimental ethanol motor. 
EXMPL-A Artificial data set which shows a few simple relationships among variables.
FISH1SPECIES Subset of data set FISHCATCH showing the relationship between length of weight of fish.
FISHCATCH Body measurements of different species of perch.
FLURIEDW This data set comprises geometric measures of 100 authentic and 100 counterfeit bank notes.
data courtesy of H. Riedwyl, Bern, Switzerland
FREEFALL Simulated data to show variability in data. A steelball is released at different heights; for each height the experiment is repeated 100 times.
HENRYSEM Henry's constant of chemical substances together with molecular descriptors. The physical data has been obtained from Hine et al., the molecular descriptors have been calculated using TOPIX
HUMIDIT2 Average Relative Humidity(%) of 264 places in USA. The data set contains the data of June and September, morning and afternoon each. In addition, the annual averages are in the last two columns. 
IRIS Three types of iris plants. The plants are described by four variables.
METHANE This data set contains the concentration of atmospheric methane measured monthly during the period from September 1980 to September 1988.
MINWATER Chemical analysis of different brands of mineral water.
MINWATER2 Subset of MINWATER
MOTE9603 Climate data obtained from Mote weather station, Florida, USA. Data set contains measurements of 9 meteorological variables over a period of ten days in March 1996. 
data courtesy of Don Hayward
MOTETIDES Water level at the Mote weather station, Florida, USA, during July 1998. Data was obtained every 15 minutes.
data courtesy of Don Hayward
MULTIEST Artificial data used in an interactive example on multidimensional models.
POLYFIT Artificial data showing a polynomial relationship of the third order.
PRECIPITATION Normal monthly precipitation (Inches) in the period 1961-90. 
REACTTEST The reaction times to visual stimuli were recorded for 9 persons. The experiment was repeated on two different days; one series was obtained before a two-hour lecture, the other series after a two-hour lecture.
STRONTIUM Simulated data to show two-sample t-test.
SUNSPOTS Average monthly sunspot areas between 1874 and 1998.
data courtesy of David H. Hathaway
TRAIN Simulated data to show a skewed distribution. 
TWOCLASS Artificial data set containing two classes of observations
WATERRESID Subset of MINWATER
WINE Chemical analysis of three kinds of  Italian red wines (Barolo, Grignolino, Barbera).
data courtesy of M. Forina
WINEGER Chemical analysis of various kinds of German wines.
data courtesy of Klaus Danzer, Friedrich-Schiller-Universität Jena, Germany
WORLDPOP Demographical, sociological and economical data on the world's nations (1988).
data source: various publications of the UN, the Worldbank and the CIA