Gaussian Processes With Categorical Inputs

  • Sean-Michael Tulloch

    Student thesis: Master's ThesisMaster of Science (by Research)

    Abstract

    Regression problems arise in a variety of contexts including the development of Gaussian process models for computer simulators. Many approaches already exist. for Gaussian process regression with continuous valued inputs, however many simulators(and observational data sets) contain both continuous and discrete valued inputs. There are relatively few approaches for addressing Gaussian process regression with mixed continuous and categorical inputs. These include treed Gaussian processes, Dirichlet processes with Generalized Linear Models, and Gaussian processes which use a Hypersphere parameterization.

    The aim of this work is to extend Gaussian process models such that they can use categorical inputs e.g. someone’s occupation, {Student, Lecturer...}, alongside the usual continuous inputs. A naïve approach would be to fit independent Gaussian processes for each category, but this quickly gets inefficient as the number of categories, and in particular the number of categorical inputs, increases. In this work we propose to model the categorical inputs by including a mapping from each categorical element to a continuous real value. We propose to learn the categorical mapping using likelihood based methods. The posterior distribution of the categorical mappings and their relation are expected to reflect their relative influence on the output. Using examples we illustrate the learning dynamics of our method. We explore the strongly multi-modal nature of the posterior distributions for the mappings of the categorical data into real values. We contrast the plug-in estimators which are obtained using likelihood methods with a Bayesian approach using MCMC. Comparisons between our approach and other existing methods for categorical inputs are made on simple data sets.
    Date of AwardOct 2012
    Original languageEnglish
    Awarding Institution
    • Aston University

    Cite this

    '