List of Data Sets
The following table contains a compilation of all data sets supplied
with Teach/Me - Data Analysis. Most of the data sets are real data which
have been obtained from various sources (see reference section at end of
this page). A few data sets are simulated data sets which have been generated
with a background story in mind. The file names of the simulated data sets
are displayed in brown color.
| Filename |
Description |
Ref. |
| ALCOHOL |
Subset of the data set WINE containing only the alcohol content of
two brands. |
[7] |
| BENZ |
Spectroscopic data (NMR) on various brands of gasoline, and the relative
octane number. |
[10] |
| BODYFAT |
Percentage of body fat, age, weight, height, and ten body circumference
measurements (e.g., abdomen) are recorded for 252 men. Body fat,
a measure of health, is estimated through an underwater weighing technique.
Fitting body fat to the other measurements using multiple regression provides
a convenient way of estimating body fat for men using only a scale and
a measuring tape. |
[1] |
| BOILPTS |
Boiling points and topological descriptors of 185 chemical substances. |
[3,13] |
| CANCER |
Number of intestine cancer cases in West Germany in the period between
1955 and 1995 |
[24] |
| CIGART |
Artificial data set for classification, created by INSPECT. The data
points are arranged in a way that only non-linear methods are able to classify
the data correctly |
[2] |
| COINS |
Weight of 114 coins (Austrian 1 Schilling pieces) of different age. |
[5] |
| ETHANOL |
NOx concentration in the exhaust gases of an experimental ethanol motor. |
[25] |
| EXMPL-A |
Artificial data set which shows a few simple relationships among variables. |
- |
| FISH1SPECIES |
Subset of data set FISHCATCH showing the relationship between length
of weight of fish. |
[22] |
| FISHCATCH |
Body measurements of different species of perch. |
[22] |
| FLURIEDW |
This data set comprises geometric measures of 100 authentic and 100
counterfeit bank notes. |
[12] |
| FREEFALL |
Simulated data to show variability in data. A steelball is released
at different heights; for each height the experiment is repeated 100 times. |
- |
| HENRYSEM |
Henry's constant of chemical substances together with molecular descriptors.
The physical data has been obtained from [17], the molecular descriptors
have been calculated using TOPIX [18] |
[17,18] |
| HUMIDIT2 |
Average Relative Humidity(%) of 264 places in USA. The data set contains
the data of June and September, morning and afternoon each. In addition,
the annual averages are in the last two columns. |
[8] |
| IRIS |
Three types of iris plants. The plants are described by four variables. |
[14] |
| METHANE |
This data set contains the concentration of atmospheric methane measured
monthly during the period from September 1980 to September 1988. |
[15] |
| MINWATER |
Chemical analysis of different brands of mineral water. |
[20] |
| MINWATER2 |
Subset of MINWATER |
[20] |
| MOTE9603 |
Climate data obtained from Mote weather station, Florida, USA. Data
set contains measurements of 9 meteorological variables over a period of
ten days in March 1996. |
[4] |
| MOTETIDES |
Water level at the Mote weather station, Florida, USA, during July
1998. Data was obtained every 15 minutes. |
[4] |
| MULTIEST |
Artificial data used in an interactive example on multidimensional
models. |
- |
| POLYFIT |
Artificial data showing a polynomial relationship of the third order. |
- |
| PRECIPITATION |
Normal monthly precipitation (Inches) in the period 1961-90. |
[8] |
| RABBITS |
Fluctuations of a rabbit population |
[21] |
| REACTTEST |
The reaction times to visual stimuli were recorded for 9 persons. The
experiment was repeated on two different days; one series was obtained
before a two-hour lecture, the other series after a two-hour lecture. |
[9] |
| STRONTIUM |
Simulated data to show two-sample t-test. |
- |
| SUNSPOTS |
Average monthly sunspot areas between 1874 and 1998. |
[19] |
| TERPBIC |
Data set containing two classes of chemical substances described by
two spectral parameters. This data set cannot be treated by linear methods. |
[11] |
| TRAIN |
Simulated data to show a skewed distribution. |
- |
| TWOCLASS |
Artificial data set containing two classes of observations |
- |
| WATERRESID |
Subset of MINWATER |
[20] |
| WINE |
Chemical analysis of three kinds of Italian red wines (Barolo,
Grignolino, Barbera). |
[7] |
| WINEGER |
Chemical analysis of various kinds of German wines. |
[23] |
| WORLDPOP |
Demographical, sociological and economical data on the world's nations
(1988). |
[6] |
References to the sources of the data sets:
| [1] |
K. Penrose, A. Nelson, and A.G. Fisher, (1985),
Generalized Body Composition Prediction Equation for
Men Using Simple Measurement Techniques
Medicine and Science in Sports and Exercise 17(2) (1985)
189
Data set by courtesy of Garth Fisher |
| [2] |
H. Lohninger
INSPECT - A program system for scientific and engineering
data analysis.
Springer, Berlin, Heidelberg, New York 1996 |
| [3] |
H. Lohninger
Evaluation of Neural Networks Based on Radial Basis Functions
and Their Application to the Prediction of Boiling Points from Structural
-Parameters.
J. Chem. Inf. Comput. Sci. 33 (1993) 736-744 |
| [4] |
Mote Weather Station, Florida, USA
Data by courtesy of Don Hayward
Mote Marine Laboratory
1600 Ken Thompson Parkway
Sarasota, FL 34236, USA
http://www.mote.org/ |
| [5] |
Coins have been collected and weighted by H. Lohninger
and A. Satzinger, Vienna University of Technology, Vienna, Austria |
| [6] |
This data set has been compiled from a variety of public
sources, including the United Nations (http://www.un.org/), the
Worldbank (http://www.worldbank.org/), and the CIA Factbook (http://www.odci.gov/cia/publications/factbook/). |
| [7] |
M. Forina, E. Tiscornia
Ann. Chim. 72 (1982) 143
Data set courtesy of M. Forina, Università di
Genova, Italy |
| [8] |
The data has been published by the National Climatic
Data Center on their Web site: http://www.ncdc.noaa.gov/ |
| [9] |
H.Lohninger
Reaction measurements to visual stimuli.
Vienna University of Technology, 1998 |
| [10] |
R. Meusinger, R. Moros:
Application of Genetic Algorithms and Neural Networks
in Analysis of Multicomponent Mixtures by NMR-Spectroscopy, in J. Gasteiger
(Ed.) "Software Development in Chemistry, 10", Gesellschaft Deutscher Chemiker,
Frankfurt 1996, p. 209
Data set courtesy of R. Meusinger. |
| [11] |
H. Lohninger
Data has been computed from mass spectral data by means
of MSLIB
(http://www.lohninger.com/mslib.html) |
| [12] |
B. Flury, H. Riedwyl
Angewandte multivariate Statistik
G.Fischer- Verlag, Stuttgart 1983
Data set by courtesy of H. Riedwyl, Bern, Switzerland |
| [13] |
A.T. Balaban, L.B. Kier, N. Joshi
Correlations between chemical structure and normal boiling
points of acyclic ethers, peroxides, acetals and their sulfur analogues
J. Chem. Inf. Comput.Sci. 32 (1992) 237-244 |
| [14] |
R.A. Fisher
The use of multiple measurements in taxonomic problems
Annual Eugenics 7 (1936), Part II, 179-188 |
| [15] |
M.A.K. Khalil, R.A. Rasmussen
Atmospheric Methane: Recent Global Trends
Environ. Sci. Technol. 1990, 24, 549-553 |
| [16] |
H. Lohninger
Estimation of Soil Partition Coefficients of Pesticides
from their Chemical Structure
Chemosphere 29 (1994) 1611 |
| [17] |
J. Hine, P.K. Mookerjee
The intrinsic hydrophilic character of organic compounds.
Correlations in terms of structural contributions
J. Org. Chem. 40 (1975) 292-298 |
| [18] |
D. Svozil, H. Lohninger
TOPIX - A program to calculate topological indices.
http://www.lohninger.com/topix.html |
| [19] |
Royal Greenwich Observatory/USAF/NOAA
NASA/Marshall Space Flight Center
Data set by courtesy of David H. Hathaway
http://science.nasa.gov/ssl/pad/solar/ |
| [20] |
H. Lohninger
Data on different brands of mineral water has been collected
by the author from the labels of the water bottles. |
| [21] |
Rabbits |
| [22] |
P. Brofeldt
Bidrag till kaennedom on fiskbestondet i vaara sjoear.
Laengelmaevesi.
in T.H.Jaervi: Finlands Fiskeriet Band 4, Meddelanden
utgivna av fiskerifoereningen i Finland.
Helsingfors 1917 |
| [23] |
G. Thiel, K. Danzer
Direct analysis of mineral components in wine by inductively
coupled plasma optical emission spectrometry (ICP-OES).
Fresenius J Anal Chem. 1997; 357: 553-557.
Data set courtesy of Klaus Danzer, Friedrich-Schiller
Universität Jena, Germany |
| [24] |
N.Becker, J. Wahrendorf
Atlas of cancer mortality in the Federal Republic of
Germany.
Springer Berlin Heidelberg 1998 |
| [25] |
N.D. Brinkman
Ethanol Fuel--A Single-cylinder engine study of efficiency
an exhaust emissions
SAE Transactions 90 (1981), No. 810345, 1410-1424. |
|