Data visualisation and exploration with prior knowledge

Dan Cornford, Martin Schroeder, Ian T. Nabney

Research output: Chapter in Book/Report/Conference proceedingChapter (peer-reviewed)

Abstract

Visualising data for exploratory analysis is a major challenge in many applications. Visualisation allows scientists to gain insight into the structure and distribution of the data, for example finding common patterns and relationships between samples as well as variables. Typically, visualisation methods like principal component analysis and multi-dimensional scaling are employed. These methods are favoured because of their simplicity, but they cannot cope with missing data and it is difficult to incorporate prior knowledge about properties of the variable space into the analysis; this is particularly important in the high-dimensional, sparse datasets typical in geochemistry. In this paper we show how to utilise a block-structured correlation matrix using a modification of a well known non-linear probabilistic visualisation model, the Generative Topographic Mapping (GTM), which can cope with missing data. The block structure supports direct modelling of strongly correlated variables. We show that including prior structural information it is possible to improve both the data visualisation and the model fit. These benefits are demonstrated on artificial data as well as a real geochemical dataset used for oil exploration, where the proposed modifications improved the missing data imputation results by 3 to 13%.
Original languageEnglish
Title of host publicationEngineering applications of neural networks
Place of PublicationBeriln (DE)
PublisherSpringer
Pages113-142
Number of pages30
Volume43 CCIS
ISBN (Print)978-3-642-03969-0
DOIs
Publication statusPublished - 19 Aug 2009

Publication series

NameCommunications in computer and information science
PublisherSpringer
Volume43
ISSN (Print)1865-0929

Fingerprint

Data visualization
Visualization
Geochemistry
Principal component analysis

Bibliographical note

The original publication is available at www.springerlink.com

Keywords

  • visualising data
  • exploratory analysis
  • principal component analysis
  • multi-dimensional scaling
  • sparse datasets
  • geochemistry
  • block-structured correlation matrix
  • non-linear probabilistic visualisation model
  • Generative Topographic Mapping
  • geochemical dataset
  • oil exploration

Cite this

Cornford, D., Schroeder, M., & Nabney, I. T. (2009). Data visualisation and exploration with prior knowledge. In Engineering applications of neural networks (Vol. 43 CCIS, pp. 113-142). (Communications in computer and information science; Vol. 43). Beriln (DE): Springer. https://doi.org/10.1007/978-3-642-03969-0_13
Cornford, Dan ; Schroeder, Martin ; Nabney, Ian T. / Data visualisation and exploration with prior knowledge. Engineering applications of neural networks. Vol. 43 CCIS Beriln (DE) : Springer, 2009. pp. 113-142 (Communications in computer and information science).
@inbook{6c85ff85b5b6453f91dfabfc4e14a917,
title = "Data visualisation and exploration with prior knowledge",
abstract = "Visualising data for exploratory analysis is a major challenge in many applications. Visualisation allows scientists to gain insight into the structure and distribution of the data, for example finding common patterns and relationships between samples as well as variables. Typically, visualisation methods like principal component analysis and multi-dimensional scaling are employed. These methods are favoured because of their simplicity, but they cannot cope with missing data and it is difficult to incorporate prior knowledge about properties of the variable space into the analysis; this is particularly important in the high-dimensional, sparse datasets typical in geochemistry. In this paper we show how to utilise a block-structured correlation matrix using a modification of a well known non-linear probabilistic visualisation model, the Generative Topographic Mapping (GTM), which can cope with missing data. The block structure supports direct modelling of strongly correlated variables. We show that including prior structural information it is possible to improve both the data visualisation and the model fit. These benefits are demonstrated on artificial data as well as a real geochemical dataset used for oil exploration, where the proposed modifications improved the missing data imputation results by 3 to 13{\%}.",
keywords = "visualising data, exploratory analysis, principal component analysis, multi-dimensional scaling, sparse datasets, geochemistry, block-structured correlation matrix, non-linear probabilistic visualisation model, Generative Topographic Mapping, geochemical dataset, oil exploration",
author = "Dan Cornford and Martin Schroeder and Nabney, {Ian T.}",
note = "The original publication is available at www.springerlink.com",
year = "2009",
month = "8",
day = "19",
doi = "10.1007/978-3-642-03969-0_13",
language = "English",
isbn = "978-3-642-03969-0",
volume = "43 CCIS",
series = "Communications in computer and information science",
publisher = "Springer",
pages = "113--142",
booktitle = "Engineering applications of neural networks",
address = "Germany",

}

Cornford, D, Schroeder, M & Nabney, IT 2009, Data visualisation and exploration with prior knowledge. in Engineering applications of neural networks. vol. 43 CCIS, Communications in computer and information science, vol. 43, Springer, Beriln (DE), pp. 113-142. https://doi.org/10.1007/978-3-642-03969-0_13

Data visualisation and exploration with prior knowledge. / Cornford, Dan; Schroeder, Martin; Nabney, Ian T.

Engineering applications of neural networks. Vol. 43 CCIS Beriln (DE) : Springer, 2009. p. 113-142 (Communications in computer and information science; Vol. 43).

Research output: Chapter in Book/Report/Conference proceedingChapter (peer-reviewed)

TY - CHAP

T1 - Data visualisation and exploration with prior knowledge

AU - Cornford, Dan

AU - Schroeder, Martin

AU - Nabney, Ian T.

N1 - The original publication is available at www.springerlink.com

PY - 2009/8/19

Y1 - 2009/8/19

N2 - Visualising data for exploratory analysis is a major challenge in many applications. Visualisation allows scientists to gain insight into the structure and distribution of the data, for example finding common patterns and relationships between samples as well as variables. Typically, visualisation methods like principal component analysis and multi-dimensional scaling are employed. These methods are favoured because of their simplicity, but they cannot cope with missing data and it is difficult to incorporate prior knowledge about properties of the variable space into the analysis; this is particularly important in the high-dimensional, sparse datasets typical in geochemistry. In this paper we show how to utilise a block-structured correlation matrix using a modification of a well known non-linear probabilistic visualisation model, the Generative Topographic Mapping (GTM), which can cope with missing data. The block structure supports direct modelling of strongly correlated variables. We show that including prior structural information it is possible to improve both the data visualisation and the model fit. These benefits are demonstrated on artificial data as well as a real geochemical dataset used for oil exploration, where the proposed modifications improved the missing data imputation results by 3 to 13%.

AB - Visualising data for exploratory analysis is a major challenge in many applications. Visualisation allows scientists to gain insight into the structure and distribution of the data, for example finding common patterns and relationships between samples as well as variables. Typically, visualisation methods like principal component analysis and multi-dimensional scaling are employed. These methods are favoured because of their simplicity, but they cannot cope with missing data and it is difficult to incorporate prior knowledge about properties of the variable space into the analysis; this is particularly important in the high-dimensional, sparse datasets typical in geochemistry. In this paper we show how to utilise a block-structured correlation matrix using a modification of a well known non-linear probabilistic visualisation model, the Generative Topographic Mapping (GTM), which can cope with missing data. The block structure supports direct modelling of strongly correlated variables. We show that including prior structural information it is possible to improve both the data visualisation and the model fit. These benefits are demonstrated on artificial data as well as a real geochemical dataset used for oil exploration, where the proposed modifications improved the missing data imputation results by 3 to 13%.

KW - visualising data

KW - exploratory analysis

KW - principal component analysis

KW - multi-dimensional scaling

KW - sparse datasets

KW - geochemistry

KW - block-structured correlation matrix

KW - non-linear probabilistic visualisation model

KW - Generative Topographic Mapping

KW - geochemical dataset

KW - oil exploration

UR - http://www.scopus.com/inward/record.url?scp=78049377088&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-03969-0_13

DO - 10.1007/978-3-642-03969-0_13

M3 - Chapter (peer-reviewed)

SN - 978-3-642-03969-0

VL - 43 CCIS

T3 - Communications in computer and information science

SP - 113

EP - 142

BT - Engineering applications of neural networks

PB - Springer

CY - Beriln (DE)

ER -

Cornford D, Schroeder M, Nabney IT. Data visualisation and exploration with prior knowledge. In Engineering applications of neural networks. Vol. 43 CCIS. Beriln (DE): Springer. 2009. p. 113-142. (Communications in computer and information science). https://doi.org/10.1007/978-3-642-03969-0_13