Fundamentals of Statistics contains material of various lectures and courses of H. Lohninger on statistics, data analysis and here for more.

Common Questions about ANNs

The following paragraphs contain some commonly asked questions about ANNs.

When should neural networks be used?

In order to avoid solving simple tasks with complex models, traditional statistical methods dealing with linear mappings should be exploited first. To check whether non-linear methods provide better results and reveal more information, experiments with simple standard versions of neural networks should be conducted. Later, the model can be improved by adding more layers, more units, or feedback loops.

What about pre-processing the data?

Inexperienced users of neural networks tend to overestimate the capabilities of ANNs. Actually, pre-processing is very important and may be the key to success (also see the section about the transformation of data space).

Is it true that neural networks can approximate almost any function?

Even though it can be proven that certain types of models can actually approximate almost any given function, this does not automatically imply that the implemented neural network can always be trained appropriately. Typically, neural networks get caught in local minima during the training process. So, they may not come close to the intended mapping.

How many data points are necessary to train a neural network?

Since neural networks are trained through examples, large data sets are required. Before starting the experiments, try to collect as many examples as possible. Especially models with many degrees of freedom (e.g. many connections in the network), require a large number of examples. There exist heuristics for finding out the maximum number of degrees of freedom for a given number of examples, or the minimum number of required examples for a given number of degrees of freedom. However, this criteria is hardly ever met in practice. When the available data set is not large enough, the results are not reliable.

  • If it is impossible to get more data, restrict yourself to small models with a few degrees of freedom (e.g. use only a small number of units per layer, so that the number of connections remains small).
  • Use n-fold cross-validation to obtain more reliable results with the same data set.
  • Use noise addition to test the reliability of a trained network.

Is the number of examples per class important ?

The relative amount of examples per class influences the resulting network. The more often a type of pattern is presented, the better it is learned. If you want all classes to be equally well learned, use the same number of examples per class. If the number of examples per class is close, you may not want to consider this issue. But if 90% of the examples belong to one class, the network may not learn to recognize the other class, because only 10% of the examples belong to it.

Warning: When changing the number of examples per class, the performance of the net may not be adequate for the original problem. For example, if the distribution of classes in the natural environment differs from that in the training set, you have to test the trained network with an independent test set taken from the natural environment to find out how it performs there.

What about special cases in the training data set?

Usually, the neural network does not learn to treat the special cases correctly, because they are not presented to the network often enough. The neural network takes the statistical distribution of the data into account, and tends to neglect outliers. Here are a few tips on how to handle special cases:

  • Collect many examples of this kind. When using many similar examples of special cases, the network learns to handle them.
  • Use a neural network model, which can deal with special cases.
  • Use case-based reasoning.
  • Treat the special cases separately.

Can ANNs be re-trained with new examples ?

Whether this is possible depends on the neural network model, e.g. the multi-layer perceptron is not well-suited to re-training. Starting from scratch is usually faster and provides better results. Other models may be applicable to the additional integration of examples.

Can a neural network handle several tasks at once?

Whether this is reasonable depends on the tasks. If the tasks are closely related, this can improve the performance, because the weights leading to the hidden units pre-structure the task appropriately. However, tasks which are too different usually interfere.  In general, use separate neural networks with single output units for each task. This provides a better overview, and allows smaller networks to be used.

How many hidden units should be used?

The hidden units pre-structure the inputs so that they are useful for solving the task. You should try to use as few hidden units as possible. When there are many hidden units, the network tends to adapt too well to the training set. Thus, it is less suited to generalizing. The removal of a single hidden unit considerably reduces the size of the network (and thus the number of degrees of freedom), because a single hidden unit is connected with all the input units and all the output units.