Fundamentals of Statistics contains material of various lectures and courses of H. Lohninger on statistics, data analysis and chemometrics......click here for more.

## Neural Networks - Extrapolation

Neural networks exhibit a major drawback when compared to linear methods of function approximation: they cannot extrapolate. This is due to the fact that a neural network can map virtually any function by adjusting its parameters according to the presented training data. For regions of the variable space where no training data is available, the output of a neural network is not reliable.

Basically, the data space which can be processed by a trained neural network is split into two regions:

1. the region where the data density of training set is greater than zero, and
2. any other part of the data space where the density of the training data is zero (or near zero).
For unknown data points which fall into the first region, we use interpolation, all other points have to be estimated by extrapolation.

In order to overcome this problem, one should in some form record the range of the variable space where training data is available. In principle, this could be done by calculating the convex hull of the training data set. If unknown data presented to the net are within this hull, the output of the net can be considered as reliable. However, the concept of the convex hull is not satisfactory since this hull is complicated to calculate and provides no solution for problems where the input data space is concave. A better method, proposed by Leonard et al. , is to estimate the local density of training data by using Parzen windows . This would be suitable for all types of networks. Radial basis function networks provide another elegant yet simple method of detecting extrapolation regions.

 Example: Let's have a look at an example concerning the performance of neural networks under extrapolating conditions. In order to simplify the set-up, we look at one-dimensional input data, which is related to a single response variable. The training data is shown in the bottom part of the figure below. Applying 15 trained networks to unknown data results in reponses which are consistent in the areas where training data was available, while producing arbitrary outputs anywhere else.