TY - CHAP
T1 - Incorporating curvature information into on-line learning
AU - Rattray, Magnus
AU - Saad, David
N1 - Copyright of the Institute of Electrical and Electronics Engineers (IEEE)
PY - 1999/1
Y1 - 1999/1
N2 - We analyse the dynamics of a number of second order on-line learning algorithms training multi-layer neural networks, using the methods of statistical mechanics. We first consider on-line Newton's method, which is known to provide optimal asymptotic performance. We determine the asymptotic generalization error decay for a soft committee machine, which is shown to compare favourably with the result for standard gradient descent. Matrix momentum provides a practical approximation to this method by allowing an efficient inversion of the Hessian. We consider an idealized matrix momentum algorithm which requires access to the Hessian and find close correspondence with the dynamics of on-line Newton's method. In practice, the Hessian will not be known on-line and we therefore consider matrix momentum using a single example approximation to the Hessian. In this case good asymptotic performance may still be achieved, but the algorithm is now sensitive to parameter choice because of noise in the Hessian estimate. On-line Newton's method is not appropriate during the transient learning phase, since a suboptimal unstable fixed point of the gradient descent dynamics becomes stable for this algorithm. A principled alternative is to use Amari's natural gradient learning algorithm and we show how this method provides a significant reduction in learning time when compared to gradient descent, while retaining the asymptotic performance of on-line Newton's method.
AB - We analyse the dynamics of a number of second order on-line learning algorithms training multi-layer neural networks, using the methods of statistical mechanics. We first consider on-line Newton's method, which is known to provide optimal asymptotic performance. We determine the asymptotic generalization error decay for a soft committee machine, which is shown to compare favourably with the result for standard gradient descent. Matrix momentum provides a practical approximation to this method by allowing an efficient inversion of the Hessian. We consider an idealized matrix momentum algorithm which requires access to the Hessian and find close correspondence with the dynamics of on-line Newton's method. In practice, the Hessian will not be known on-line and we therefore consider matrix momentum using a single example approximation to the Hessian. In this case good asymptotic performance may still be achieved, but the algorithm is now sensitive to parameter choice because of noise in the Hessian estimate. On-line Newton's method is not appropriate during the transient learning phase, since a suboptimal unstable fixed point of the gradient descent dynamics becomes stable for this algorithm. A principled alternative is to use Amari's natural gradient learning algorithm and we show how this method provides a significant reduction in learning time when compared to gradient descent, while retaining the asymptotic performance of on-line Newton's method.
KW - multi-layer neural networks
KW - statistical mechanics
KW - optimal asymptotic performance
KW - asymptotic generalization error decay
KW - Matrix momentum
UR - http://www.cambridge.org/uk/catalogue/catalogue.asp?isbn=0521652634
U2 - 10.2277/0521652634
DO - 10.2277/0521652634
M3 - Chapter
SN - 0521652634
T3 - Publications of the Newton Institute
SP - 183
EP - 207
BT - On-line learning in neural networks
A2 - Saad, David
PB - Cambridge University Press
CY - Isaac Newton Institute, Cambridge
T2 - Proceedings of the on-line learning themed week
Y2 - 1 January 1999 through 1 January 1999
ER -