Abstract
An adaptive back-propagation algorithm parameterized by an inverse temperature 1/T is studied and compared with gradient descent (standard back-propagation) for on-line learning in two-layer neural networks with an arbitrary number of hidden units. Within a statistical mechanics framework, we analyse these learning algorithms in both the symmetric and the convergence phase for finite learning rates in the case of uncorrelated teachers of similar but arbitrary length T. These analyses show that adaptive back-propagation results generally in faster training by breaking the symmetry between hidden units more efficiently and by providing faster convergence to optimal generalization than gradient descent.
Original language | English |
---|---|
Pages (from-to) | 3426-3445 |
Number of pages | 20 |
Journal | Physical Review E |
Volume | 56 |
Issue number | 3 |
DOIs | |
Publication status | Published - Sept 1997 |
Bibliographical note
Copyright of the American Physical SocietyKeywords
- adaptive back-propagation
- algorithm
- inverse temperature
- gradient descent
- on-line learning
- neural networks
- learning algorithms