Abstract
Neural networks can be regarded as statistical models, and can be analysed in a Bayesian framework. Generalisation is measured by the performance on independent test data drawn from the same distribution as the training data. Such performance can be quantified by the posterior average of the information divergence between the true and the model distributions. Averaging over the Bayesian posterior guarantees internal coherence; Using information divergence guarantees invariance with respect to representation.
The theory generalises the least mean squares theory for linear Gaussian models to general problems of statistical estimation. The main results are: (1)~the ideal optimal estimate is always given by average over the posterior; (2)~the optimal estimate within a computational model is given by the projection of the ideal estimate to the model. This incidentally shows some currently popular methods dealing with hyperpriors are in general unnecessary and misleading.
The extension of information divergence to positive normalisable measures reveals a remarkable relation between the <span class='mathrm'>dlt</span> dual affine geometry of statistical manifolds and the geometry of the dual pair of Banach spaces <span class='mathrm'>Ld</span> and <span class='mathrm'>Ldd</span>. It therefore offers conceptual simplification to information geometry.
The general conclusion on the issue of evaluating neural network learning rules and other statistical inference methods is that such evaluations are only meaningful under three assumptions: The prior <span class='mathrm'>P(p)</span>, describing the environment of all the problems; the divergence <span class='mathrm'>Dd</span>, specifying the requirement of the task; and the model <span class='mathrm'>Q</span>, specifying available computing resources.
Original language | English |
---|---|
Place of Publication | Birmingham, UK |
Publisher | Aston University |
Number of pages | 37 |
ISBN (Print) | NCRG/95/005 |
Publication status | Published - 1995 |
Keywords
- Neural networks
- Bayesian framework
- internal coherence
- statistical estimation
- information geometry
- statistical inference
- computing resources