Data visualisation and exploration with prior knowledge

Research output: Chapter in Book/Report/Conference proceedingChapter (peer-reviewed)

View graph of relations Save citation

Authors

Research units

Abstract

Visualising data for exploratory analysis is a major challenge in many applications. Visualisation allows scientists to gain insight into the structure and distribution of the data, for example finding common patterns and relationships between samples as well as variables. Typically, visualisation methods like principal component analysis and multi-dimensional scaling are employed. These methods are favoured because of their simplicity, but they cannot cope with missing data and it is difficult to incorporate prior knowledge about properties of the variable space into the analysis; this is particularly important in the high-dimensional, sparse datasets typical in geochemistry. In this paper we show how to utilise a block-structured correlation matrix using a modification of a well known non-linear probabilistic visualisation model, the Generative Topographic Mapping (GTM), which can cope with missing data. The block structure supports direct modelling of strongly correlated variables. We show that including prior structural information it is possible to improve both the data visualisation and the model fit. These benefits are demonstrated on artificial data as well as a real geochemical dataset used for oil exploration, where the proposed modifications improved the missing data imputation results by 3 to 13%.

Documents

  • Schroeder2009EANN.pdf

    Rights statement: The original publication is available at www.springerlink.com

    2 MB, PDF-document

Details

Publication date19 Aug 2009
Publication titleEngineering applications of neural networks
Place of PublicationBeriln (DE)
PublisherSpringer
Pages113-142
Number of pages30
Volume43 CCIS
ISBN (Print)978-3-642-03969-0
Original languageEnglish

Publication series

NameCommunications in computer and information science
PublisherSpringer
Volume43
ISSN (Print)1865-0929

Bibliographic note

The original publication is available at www.springerlink.com

    Keywords

  • visualising data, exploratory analysis, principal component analysis, multi-dimensional scaling, sparse datasets, geochemistry, block-structured correlation matrix, non-linear probabilistic visualisation model, Generative Topographic Mapping, geochemical dataset, oil exploration

DOI

Download statistics

No data available

Employable Graduates; Exploitable Research

Copy the text from this field...