Modelling Inference Strategies and Robust Clustering Topologies

  • Adam Farooq

    Student thesis: Doctoral ThesisDoctor of Philosophy

    Abstract

    Latent variable models are used extensively in unsupervised learning within the Bayesian paradigm, these include (but are not limited to) mixture models which can be used for clustering, and linear Gaussian models which can be used for dimensionality reduction. Clustering aims to find some underlying groups within a data set, where data points that belong to the same group (also called a cluster) are more ‘similar’ to one another than data points that belong to different groups. Dimensionality reduction aims to reduce the dimension of a data set while minimising some information loss, for example, if two data points are relatively ‘close’ to one another in the observed data space, then they should also be relatively ‘close’ in the reduced dimension data space.

    The Bayesian paradigm offers rules for learning from data and integrating out uncertainty, however, it can be a curse within latent variable models. For example, any misspecification of the likelihood within a mixture model will result in incorrect clustering. To combat this, we propose novel techniques to assist latent variable models to learn meaningful information.

    We first propose a mixture model for clustering and density estimation of count data, which unlike other mixture models from the exponential family of distributions does not make a strong a-priori assumption on the dispersion of the observed data. The proposed model uses a mixture of Panjer distributions, which learns the dispersion of the observed data in a data-driven manner; we call this the Panjer mixture model. We study practical inference with the Panjer mixture model and propose an efficient maximisation-maximisation scheme for training the Panjer mixture model and demonstrate its utility on different data sets.

    We propose an approach that aims to robustify the likelihood of a model with respect to any likelihood misspecification. Unlike the vast existing work, the proposed model is not an attempt to infer the parameters of a model in a robust manner, but it aims to learn the correct data-generating distribution. This is done by using pseudopoints in the data space which have an empirical density that is ‘close’ to the true data generating density; this is done using a statistical distance called the maximum mean discrepancy, which compares the summary statistic(s) between two distributions using the reproducing kernel Hilbert space (RKHS). The proposed model is applied to mixture models where each component is represented using pseudo-points. The advantage of the proposed mixture model is demonstrated on a variety of data sets.

    We also propose two discrete-continuous latent feature models which can be used for dimensionality reduction to assist in tasks such as exploratory analysis, preprocessing, data visualisation, and related tasks. A constrained feature allocation prior is placed on the discrete component of the proposed models; we call these the adaptive factor analysis model and the adaptive probabilistic principal component analysis model. We also derive efficient inference schemes for each model. The usefulness of the proposed models is demonstrated in tasks such as feature learning, data visualisation, and data whitening using different data sets.

    Bayesian nonparametric priors assume that the parameter space is infinite, this allows for flexible modelling, for example, the Dirichlet process (DP) can be used in mixture models to learn the number of clusters in a data-driven manner. However, the existing discrete Bayesian nonparametric priors assume that the latent space is discrete. We propose two novel discrete Bayesian nonparametric priors which generalise existing Bayesian nonparametric priors such as the beta-Bernoulli process, we call these the discrete marked beta-binomial process, and the marked beta-negative-binomial process. Furthermore, marginal processes for special cases of the proposed processes have also been derived which allow for efficient sampling; we call this the multi-scoop Indian buffet process and the infinite-scoop Indian buffet process.
    Date of AwardJun 2022
    Original languageEnglish
    SupervisorDavid Saad (Supervisor) & Helen Higson (Supervisor)

    Cite this

    '