Incorporating curvature information into on-line learning

Magnus Rattray; David Saad

doi:10.2277/0521652634

Incorporating curvature information into on-line learning

Magnus Rattray, David Saad

Research output: Chapter in Book/Published conference output › Chapter

Abstract

We analyse the dynamics of a number of second order on-line learning algorithms training multi-layer neural networks, using the methods of statistical mechanics. We first consider on-line Newton's method, which is known to provide optimal asymptotic performance. We determine the asymptotic generalization error decay for a soft committee machine, which is shown to compare favourably with the result for standard gradient descent. Matrix momentum provides a practical approximation to this method by allowing an efficient inversion of the Hessian. We consider an idealized matrix momentum algorithm which requires access to the Hessian and find close correspondence with the dynamics of on-line Newton's method. In practice, the Hessian will not be known on-line and we therefore consider matrix momentum using a single example approximation to the Hessian. In this case good asymptotic performance may still be achieved, but the algorithm is now sensitive to parameter choice because of noise in the Hessian estimate. On-line Newton's method is not appropriate during the transient learning phase, since a suboptimal unstable fixed point of the gradient descent dynamics becomes stable for this algorithm. A principled alternative is to use Amari's natural gradient learning algorithm and we show how this method provides a significant reduction in learning time when compared to gradient descent, while retaining the asymptotic performance of on-line Newton's method.

Original language	English
Title of host publication	On-line learning in neural networks
Editors	David Saad
Place of Publication	Isaac Newton Institute, Cambridge
Publisher	Cambridge University Press
Pages	183-207
Number of pages	25
ISBN (Print)	0521652634
DOIs	https://doi.org/10.2277/0521652634
Publication status	Published - Jan 1999
Event	Proceedings of the on-line learning themed week - Duration: 1 Jan 1999 → 1 Jan 1999

Publication series

Name	Publications of the Newton Institute
Publisher	Cambridge University Press

Other

Other	Proceedings of the on-line learning themed week
Period	1/01/99 → 1/01/99

Bibliographical note

Copyright of the Institute of Electrical and Electronics Engineers (IEEE)

Keywords

multi-layer neural networks
statistical mechanics
optimal asymptotic performance
asymptotic generalization error decay
Matrix momentum

Access to Document

10.2277/0521652634

Saad_Rattray.pdf
Copyright of the Institute of Electrical and Electronics Engineers (IEEE)
Final published version, 4.26 MB

Cite this

@inbook{bb35ae6b8ed9439a9f9761e1edce82f9,

title = "Incorporating curvature information into on-line learning",

abstract = "We analyse the dynamics of a number of second order on-line learning algorithms training multi-layer neural networks, using the methods of statistical mechanics. We first consider on-line Newton's method, which is known to provide optimal asymptotic performance. We determine the asymptotic generalization error decay for a soft committee machine, which is shown to compare favourably with the result for standard gradient descent. Matrix momentum provides a practical approximation to this method by allowing an efficient inversion of the Hessian. We consider an idealized matrix momentum algorithm which requires access to the Hessian and find close correspondence with the dynamics of on-line Newton's method. In practice, the Hessian will not be known on-line and we therefore consider matrix momentum using a single example approximation to the Hessian. In this case good asymptotic performance may still be achieved, but the algorithm is now sensitive to parameter choice because of noise in the Hessian estimate. On-line Newton's method is not appropriate during the transient learning phase, since a suboptimal unstable fixed point of the gradient descent dynamics becomes stable for this algorithm. A principled alternative is to use Amari's natural gradient learning algorithm and we show how this method provides a significant reduction in learning time when compared to gradient descent, while retaining the asymptotic performance of on-line Newton's method.",

keywords = "multi-layer neural networks, statistical mechanics, optimal asymptotic performance, asymptotic generalization error decay, Matrix momentum",

author = "Magnus Rattray and David Saad",

note = "Copyright of the Institute of Electrical and Electronics Engineers (IEEE); Proceedings of the on-line learning themed week ; Conference date: 01-01-1999 Through 01-01-1999",

year = "1999",

month = jan,

doi = "10.2277/0521652634",

language = "English",

isbn = "0521652634",

series = "Publications of the Newton Institute",

publisher = "Cambridge University Press",

pages = "183--207",

editor = "David Saad",

booktitle = "On-line learning in neural networks",

address = "United Kingdom",

}

TY - CHAP

T1 - Incorporating curvature information into on-line learning

AU - Rattray, Magnus

AU - Saad, David

N1 - Copyright of the Institute of Electrical and Electronics Engineers (IEEE)

PY - 1999/1

Y1 - 1999/1

N2 - We analyse the dynamics of a number of second order on-line learning algorithms training multi-layer neural networks, using the methods of statistical mechanics. We first consider on-line Newton's method, which is known to provide optimal asymptotic performance. We determine the asymptotic generalization error decay for a soft committee machine, which is shown to compare favourably with the result for standard gradient descent. Matrix momentum provides a practical approximation to this method by allowing an efficient inversion of the Hessian. We consider an idealized matrix momentum algorithm which requires access to the Hessian and find close correspondence with the dynamics of on-line Newton's method. In practice, the Hessian will not be known on-line and we therefore consider matrix momentum using a single example approximation to the Hessian. In this case good asymptotic performance may still be achieved, but the algorithm is now sensitive to parameter choice because of noise in the Hessian estimate. On-line Newton's method is not appropriate during the transient learning phase, since a suboptimal unstable fixed point of the gradient descent dynamics becomes stable for this algorithm. A principled alternative is to use Amari's natural gradient learning algorithm and we show how this method provides a significant reduction in learning time when compared to gradient descent, while retaining the asymptotic performance of on-line Newton's method.

AB - We analyse the dynamics of a number of second order on-line learning algorithms training multi-layer neural networks, using the methods of statistical mechanics. We first consider on-line Newton's method, which is known to provide optimal asymptotic performance. We determine the asymptotic generalization error decay for a soft committee machine, which is shown to compare favourably with the result for standard gradient descent. Matrix momentum provides a practical approximation to this method by allowing an efficient inversion of the Hessian. We consider an idealized matrix momentum algorithm which requires access to the Hessian and find close correspondence with the dynamics of on-line Newton's method. In practice, the Hessian will not be known on-line and we therefore consider matrix momentum using a single example approximation to the Hessian. In this case good asymptotic performance may still be achieved, but the algorithm is now sensitive to parameter choice because of noise in the Hessian estimate. On-line Newton's method is not appropriate during the transient learning phase, since a suboptimal unstable fixed point of the gradient descent dynamics becomes stable for this algorithm. A principled alternative is to use Amari's natural gradient learning algorithm and we show how this method provides a significant reduction in learning time when compared to gradient descent, while retaining the asymptotic performance of on-line Newton's method.

KW - multi-layer neural networks

KW - statistical mechanics

KW - optimal asymptotic performance

KW - asymptotic generalization error decay

KW - Matrix momentum

UR - http://www.cambridge.org/uk/catalogue/catalogue.asp?isbn=0521652634

U2 - 10.2277/0521652634

DO - 10.2277/0521652634

M3 - Chapter

SN - 0521652634

T3 - Publications of the Newton Institute

SP - 183

EP - 207

BT - On-line learning in neural networks

A2 - Saad, David

PB - Cambridge University Press

CY - Isaac Newton Institute, Cambridge

T2 - Proceedings of the on-line learning themed week

Y2 - 1 January 1999 through 1 January 1999

ER -

Incorporating curvature information into on-line learning

Abstract

Publication series

Other

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this