Projected sequential Gaussian processes: a C++ tool for interpolation of large datasets with heterogeneous noise

Remi Barillec*, Ben Ingram, Dan Cornford, Lehel Csató

*Corresponding author for this work

Research output: Contribution to journalArticle

Abstract

Heterogeneous datasets arise naturally in most applications due to the use of a variety of sensors and measuring platforms. Such datasets can be heterogeneous in terms of the error characteristics and sensor models. Treating such data is most naturally accomplished using a Bayesian or model-based geostatistical approach; however, such methods generally scale rather badly with the size of dataset, and require computationally expensive Monte Carlo based inference. Recently within the machine learning and spatial statistics communities many papers have explored the potential of reduced rank representations of the covariance matrix, often referred to as projected or fixed rank approaches. In such methods the covariance function of the posterior process is represented by a reduced rank approximation which is chosen such that there is minimal information loss. In this paper a sequential Bayesian framework for inference in such projected processes is presented. The observations are considered one at a time which avoids the need for high dimensional integrals typically required in a Bayesian approach. A C++ library, gptk, which is part of the INTAMAP web service, is introduced which implements projected, sequential estimation and adds several novel features. In particular the library includes the ability to use a generic observation operator, or sensor model, to permit data fusion. It is also possible to cope with a range of observation error characteristics, including non-Gaussian observation errors. Inference for the covariance parameters is explored, including the impact of the projected process approximation on likelihood profiles. We illustrate the projected sequential method in application to synthetic and real datasets. Limitations and extensions are discussed.

Original languageEnglish
Pages (from-to)295-309
Number of pages15
JournalComputers and Geosciences
Volume37
Issue number3
Early online date4 Aug 2010
DOIs
Publication statusPublished - Mar 2011

Fingerprint

interpolation
Interpolation
Sensors
sensor
Data fusion
Covariance matrix
Web services
Learning systems
Statistics
matrix
method
library

Keywords

  • heterogeneous data
  • low-rank approximations
  • sensor fusion

Cite this

Barillec, Remi ; Ingram, Ben ; Cornford, Dan ; Csató, Lehel. / Projected sequential Gaussian processes : a C++ tool for interpolation of large datasets with heterogeneous noise. In: Computers and Geosciences. 2011 ; Vol. 37, No. 3. pp. 295-309.
@article{a80163c071a64463b4ce5d6b59a4fcfa,
title = "Projected sequential Gaussian processes: a C++ tool for interpolation of large datasets with heterogeneous noise",
abstract = "Heterogeneous datasets arise naturally in most applications due to the use of a variety of sensors and measuring platforms. Such datasets can be heterogeneous in terms of the error characteristics and sensor models. Treating such data is most naturally accomplished using a Bayesian or model-based geostatistical approach; however, such methods generally scale rather badly with the size of dataset, and require computationally expensive Monte Carlo based inference. Recently within the machine learning and spatial statistics communities many papers have explored the potential of reduced rank representations of the covariance matrix, often referred to as projected or fixed rank approaches. In such methods the covariance function of the posterior process is represented by a reduced rank approximation which is chosen such that there is minimal information loss. In this paper a sequential Bayesian framework for inference in such projected processes is presented. The observations are considered one at a time which avoids the need for high dimensional integrals typically required in a Bayesian approach. A C++ library, gptk, which is part of the INTAMAP web service, is introduced which implements projected, sequential estimation and adds several novel features. In particular the library includes the ability to use a generic observation operator, or sensor model, to permit data fusion. It is also possible to cope with a range of observation error characteristics, including non-Gaussian observation errors. Inference for the covariance parameters is explored, including the impact of the projected process approximation on likelihood profiles. We illustrate the projected sequential method in application to synthetic and real datasets. Limitations and extensions are discussed.",
keywords = "heterogeneous data, low-rank approximations, sensor fusion",
author = "Remi Barillec and Ben Ingram and Dan Cornford and Lehel Csat{\'o}",
year = "2011",
month = "3",
doi = "10.1016/j.cageo.2010.05.008",
language = "English",
volume = "37",
pages = "295--309",
journal = "Computers and Geosciences",
issn = "0098-3004",
publisher = "Elsevier",
number = "3",

}

Projected sequential Gaussian processes : a C++ tool for interpolation of large datasets with heterogeneous noise. / Barillec, Remi; Ingram, Ben; Cornford, Dan; Csató, Lehel.

In: Computers and Geosciences, Vol. 37, No. 3, 03.2011, p. 295-309.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Projected sequential Gaussian processes

T2 - a C++ tool for interpolation of large datasets with heterogeneous noise

AU - Barillec, Remi

AU - Ingram, Ben

AU - Cornford, Dan

AU - Csató, Lehel

PY - 2011/3

Y1 - 2011/3

N2 - Heterogeneous datasets arise naturally in most applications due to the use of a variety of sensors and measuring platforms. Such datasets can be heterogeneous in terms of the error characteristics and sensor models. Treating such data is most naturally accomplished using a Bayesian or model-based geostatistical approach; however, such methods generally scale rather badly with the size of dataset, and require computationally expensive Monte Carlo based inference. Recently within the machine learning and spatial statistics communities many papers have explored the potential of reduced rank representations of the covariance matrix, often referred to as projected or fixed rank approaches. In such methods the covariance function of the posterior process is represented by a reduced rank approximation which is chosen such that there is minimal information loss. In this paper a sequential Bayesian framework for inference in such projected processes is presented. The observations are considered one at a time which avoids the need for high dimensional integrals typically required in a Bayesian approach. A C++ library, gptk, which is part of the INTAMAP web service, is introduced which implements projected, sequential estimation and adds several novel features. In particular the library includes the ability to use a generic observation operator, or sensor model, to permit data fusion. It is also possible to cope with a range of observation error characteristics, including non-Gaussian observation errors. Inference for the covariance parameters is explored, including the impact of the projected process approximation on likelihood profiles. We illustrate the projected sequential method in application to synthetic and real datasets. Limitations and extensions are discussed.

AB - Heterogeneous datasets arise naturally in most applications due to the use of a variety of sensors and measuring platforms. Such datasets can be heterogeneous in terms of the error characteristics and sensor models. Treating such data is most naturally accomplished using a Bayesian or model-based geostatistical approach; however, such methods generally scale rather badly with the size of dataset, and require computationally expensive Monte Carlo based inference. Recently within the machine learning and spatial statistics communities many papers have explored the potential of reduced rank representations of the covariance matrix, often referred to as projected or fixed rank approaches. In such methods the covariance function of the posterior process is represented by a reduced rank approximation which is chosen such that there is minimal information loss. In this paper a sequential Bayesian framework for inference in such projected processes is presented. The observations are considered one at a time which avoids the need for high dimensional integrals typically required in a Bayesian approach. A C++ library, gptk, which is part of the INTAMAP web service, is introduced which implements projected, sequential estimation and adds several novel features. In particular the library includes the ability to use a generic observation operator, or sensor model, to permit data fusion. It is also possible to cope with a range of observation error characteristics, including non-Gaussian observation errors. Inference for the covariance parameters is explored, including the impact of the projected process approximation on likelihood profiles. We illustrate the projected sequential method in application to synthetic and real datasets. Limitations and extensions are discussed.

KW - heterogeneous data

KW - low-rank approximations

KW - sensor fusion

UR - http://www.scopus.com/inward/record.url?scp=79952196502&partnerID=8YFLogxK

U2 - 10.1016/j.cageo.2010.05.008

DO - 10.1016/j.cageo.2010.05.008

M3 - Article

AN - SCOPUS:79952196502

VL - 37

SP - 295

EP - 309

JO - Computers and Geosciences

JF - Computers and Geosciences

SN - 0098-3004

IS - 3

ER -