Distribute Machine Learning

  • P. Latouche

Student thesis: Master's ThesisMaster of Science (by Research)

Abstract

In the last ten years, there has been an ever increasing use of databases, to store information, and Machine Learning methods to manipulate, extract, and analyse data. More and more problems are being tackled in science, health and engineering. Asa consequence, there has been a concurrent increase in the use of highly distributed computing to store and manipulate data.

In this thesis, we work on regression problems that consist of approximating underlying processes that map input variables to target variables. We introduce the concept of distributed learning environment where local agents use distributed data to train and we show that two critical applications can be tackled using such architectures. First ,in Chapter 3, we consider a situation where data is originally physically distributed on nodes. The agents do not agree to share their data for privacy and security reasons but do agree to share their models. In this environment, the issue is to combine the learned information in order to build a more accurate preditive model. For our experiments, we consider multilayer perceptrons and radial basis function networks. We test some model combination methods using a toy dataset and some scatterometry data.

Then, in Chapter 4, we tackle Gaussian processes that are known to have a poor scaling with large data sets since they require matrix inversions of which the computational cost and memory requirement are of order O(N)3 and O(N2) respectively where N is the number of training data points. We investigate techniques that consist of splitting and then distributing the data on nodes. Thus, we show that the Bayesian committee machine can be applied to estimate Gaussian process predictions whereas a factorized hyperposterior can lead to optimization procedures over the whole training data set even if N is large. We experiment with these approximations using the scatterometry data.
Date of Award2007
Original languageEnglish
Awarding Institution
  • Aston University

Keywords

  • machine learning
  • information engineering

Cite this

'