Incorporating curvature information into on-line learning

Magnus Rattray, David Saad

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

We analyse the dynamics of a number of second order on-line learning algorithms training multi-layer neural networks, using the methods of statistical mechanics. We first consider on-line Newton's method, which is known to provide optimal asymptotic performance. We determine the asymptotic generalization error decay for a soft committee machine, which is shown to compare favourably with the result for standard gradient descent. Matrix momentum provides a practical approximation to this method by allowing an efficient inversion of the Hessian. We consider an idealized matrix momentum algorithm which requires access to the Hessian and find close correspondence with the dynamics of on-line Newton's method. In practice, the Hessian will not be known on-line and we therefore consider matrix momentum using a single example approximation to the Hessian. In this case good asymptotic performance may still be achieved, but the algorithm is now sensitive to parameter choice because of noise in the Hessian estimate. On-line Newton's method is not appropriate during the transient learning phase, since a suboptimal unstable fixed point of the gradient descent dynamics becomes stable for this algorithm. A principled alternative is to use Amari's natural gradient learning algorithm and we show how this method provides a significant reduction in learning time when compared to gradient descent, while retaining the asymptotic performance of on-line Newton's method.
Original languageEnglish
Title of host publicationOn-line learning in neural networks
EditorsDavid Saad
Place of PublicationIsaac Newton Institute, Cambridge
PublisherCambridge University Press
Pages183-207
Number of pages25
ISBN (Print)0521652634
DOIs
Publication statusPublished - Jan 1999
EventProceedings of the on-line learning themed week -
Duration: 1 Jan 19991 Jan 1999

Publication series

NamePublications of the Newton Institute
PublisherCambridge University Press

Other

OtherProceedings of the on-line learning themed week
Period1/01/991/01/99

Fingerprint

Newton-Raphson method
Momentum
Learning algorithms
Statistical mechanics
Multilayer neural networks

Bibliographical note

Copyright of the Institute of Electrical and Electronics Engineers (IEEE)

Keywords

  • multi-layer neural networks
  • statistical mechanics
  • optimal asymptotic performance
  • asymptotic generalization error decay
  • Matrix momentum

Cite this

Rattray, M., & Saad, D. (1999). Incorporating curvature information into on-line learning. In D. Saad (Ed.), On-line learning in neural networks (pp. 183-207). (Publications of the Newton Institute). Isaac Newton Institute, Cambridge: Cambridge University Press. https://doi.org/10.2277/0521652634
Rattray, Magnus ; Saad, David. / Incorporating curvature information into on-line learning. On-line learning in neural networks. editor / David Saad. Isaac Newton Institute, Cambridge : Cambridge University Press, 1999. pp. 183-207 (Publications of the Newton Institute).
@inbook{bb35ae6b8ed9439a9f9761e1edce82f9,
title = "Incorporating curvature information into on-line learning",
abstract = "We analyse the dynamics of a number of second order on-line learning algorithms training multi-layer neural networks, using the methods of statistical mechanics. We first consider on-line Newton's method, which is known to provide optimal asymptotic performance. We determine the asymptotic generalization error decay for a soft committee machine, which is shown to compare favourably with the result for standard gradient descent. Matrix momentum provides a practical approximation to this method by allowing an efficient inversion of the Hessian. We consider an idealized matrix momentum algorithm which requires access to the Hessian and find close correspondence with the dynamics of on-line Newton's method. In practice, the Hessian will not be known on-line and we therefore consider matrix momentum using a single example approximation to the Hessian. In this case good asymptotic performance may still be achieved, but the algorithm is now sensitive to parameter choice because of noise in the Hessian estimate. On-line Newton's method is not appropriate during the transient learning phase, since a suboptimal unstable fixed point of the gradient descent dynamics becomes stable for this algorithm. A principled alternative is to use Amari's natural gradient learning algorithm and we show how this method provides a significant reduction in learning time when compared to gradient descent, while retaining the asymptotic performance of on-line Newton's method.",
keywords = "multi-layer neural networks, statistical mechanics, optimal asymptotic performance, asymptotic generalization error decay, Matrix momentum",
author = "Magnus Rattray and David Saad",
note = "Copyright of the Institute of Electrical and Electronics Engineers (IEEE)",
year = "1999",
month = "1",
doi = "10.2277/0521652634",
language = "English",
isbn = "0521652634",
series = "Publications of the Newton Institute",
publisher = "Cambridge University Press",
pages = "183--207",
editor = "David Saad",
booktitle = "On-line learning in neural networks",
address = "United Kingdom",

}

Rattray, M & Saad, D 1999, Incorporating curvature information into on-line learning. in D Saad (ed.), On-line learning in neural networks. Publications of the Newton Institute, Cambridge University Press, Isaac Newton Institute, Cambridge, pp. 183-207, Proceedings of the on-line learning themed week, 1/01/99. https://doi.org/10.2277/0521652634

Incorporating curvature information into on-line learning. / Rattray, Magnus; Saad, David.

On-line learning in neural networks. ed. / David Saad. Isaac Newton Institute, Cambridge : Cambridge University Press, 1999. p. 183-207 (Publications of the Newton Institute).

Research output: Chapter in Book/Report/Conference proceedingChapter

TY - CHAP

T1 - Incorporating curvature information into on-line learning

AU - Rattray, Magnus

AU - Saad, David

N1 - Copyright of the Institute of Electrical and Electronics Engineers (IEEE)

PY - 1999/1

Y1 - 1999/1

N2 - We analyse the dynamics of a number of second order on-line learning algorithms training multi-layer neural networks, using the methods of statistical mechanics. We first consider on-line Newton's method, which is known to provide optimal asymptotic performance. We determine the asymptotic generalization error decay for a soft committee machine, which is shown to compare favourably with the result for standard gradient descent. Matrix momentum provides a practical approximation to this method by allowing an efficient inversion of the Hessian. We consider an idealized matrix momentum algorithm which requires access to the Hessian and find close correspondence with the dynamics of on-line Newton's method. In practice, the Hessian will not be known on-line and we therefore consider matrix momentum using a single example approximation to the Hessian. In this case good asymptotic performance may still be achieved, but the algorithm is now sensitive to parameter choice because of noise in the Hessian estimate. On-line Newton's method is not appropriate during the transient learning phase, since a suboptimal unstable fixed point of the gradient descent dynamics becomes stable for this algorithm. A principled alternative is to use Amari's natural gradient learning algorithm and we show how this method provides a significant reduction in learning time when compared to gradient descent, while retaining the asymptotic performance of on-line Newton's method.

AB - We analyse the dynamics of a number of second order on-line learning algorithms training multi-layer neural networks, using the methods of statistical mechanics. We first consider on-line Newton's method, which is known to provide optimal asymptotic performance. We determine the asymptotic generalization error decay for a soft committee machine, which is shown to compare favourably with the result for standard gradient descent. Matrix momentum provides a practical approximation to this method by allowing an efficient inversion of the Hessian. We consider an idealized matrix momentum algorithm which requires access to the Hessian and find close correspondence with the dynamics of on-line Newton's method. In practice, the Hessian will not be known on-line and we therefore consider matrix momentum using a single example approximation to the Hessian. In this case good asymptotic performance may still be achieved, but the algorithm is now sensitive to parameter choice because of noise in the Hessian estimate. On-line Newton's method is not appropriate during the transient learning phase, since a suboptimal unstable fixed point of the gradient descent dynamics becomes stable for this algorithm. A principled alternative is to use Amari's natural gradient learning algorithm and we show how this method provides a significant reduction in learning time when compared to gradient descent, while retaining the asymptotic performance of on-line Newton's method.

KW - multi-layer neural networks

KW - statistical mechanics

KW - optimal asymptotic performance

KW - asymptotic generalization error decay

KW - Matrix momentum

UR - http://www.cambridge.org/uk/catalogue/catalogue.asp?isbn=0521652634

U2 - 10.2277/0521652634

DO - 10.2277/0521652634

M3 - Chapter

SN - 0521652634

T3 - Publications of the Newton Institute

SP - 183

EP - 207

BT - On-line learning in neural networks

A2 - Saad, David

PB - Cambridge University Press

CY - Isaac Newton Institute, Cambridge

ER -

Rattray M, Saad D. Incorporating curvature information into on-line learning. In Saad D, editor, On-line learning in neural networks. Isaac Newton Institute, Cambridge: Cambridge University Press. 1999. p. 183-207. (Publications of the Newton Institute). https://doi.org/10.2277/0521652634