Data visualisation and exploration with prior knowledge

Dan Cornford; Martin Schroeder; Ian T. Nabney

doi:10.1007/978-3-642-03969-0_13

Data visualisation and exploration with prior knowledge

Dan Cornford, Martin Schroeder, Ian T. Nabney

Computer Science Research Group

Research output: Chapter in Book/Published conference output › Chapter (peer-reviewed)

Abstract

Visualising data for exploratory analysis is a major challenge in many applications. Visualisation allows scientists to gain insight into the structure and distribution of the data, for example finding common patterns and relationships between samples as well as variables. Typically, visualisation methods like principal component analysis and multi-dimensional scaling are employed. These methods are favoured because of their simplicity, but they cannot cope with missing data and it is difficult to incorporate prior knowledge about properties of the variable space into the analysis; this is particularly important in the high-dimensional, sparse datasets typical in geochemistry. In this paper we show how to utilise a block-structured correlation matrix using a modification of a well known non-linear probabilistic visualisation model, the Generative Topographic Mapping (GTM), which can cope with missing data. The block structure supports direct modelling of strongly correlated variables. We show that including prior structural information it is possible to improve both the data visualisation and the model fit. These benefits are demonstrated on artificial data as well as a real geochemical dataset used for oil exploration, where the proposed modifications improved the missing data imputation results by 3 to 13%.

Original language	English
Title of host publication	Engineering applications of neural networks
Place of Publication	Beriln (DE)
Publisher	Springer
Pages	113-142
Number of pages	30
Volume	43 CCIS
ISBN (Print)	978-3-642-03969-0
DOIs	https://doi.org/10.1007/978-3-642-03969-0_13
Publication status	Published - 19 Aug 2009

Publication series

Name	Communications in computer and information science
Publisher	Springer
Volume	43
ISSN (Print)	1865-0929

Bibliographical note

The original publication is available at www.springerlink.com

Keywords

visualising data
exploratory analysis
principal component analysis
multi-dimensional scaling
sparse datasets
geochemistry
block-structured correlation matrix
non-linear probabilistic visualisation model
Generative Topographic Mapping
geochemical dataset
oil exploration

Access to Document

10.1007/978-3-642-03969-0_13

Schroeder2009EANN.pdf
The original publication is available at www.springerlink.com

http://www.springerlink.com/content/p63133k8561714q4/

Cite this

@inbook{6c85ff85b5b6453f91dfabfc4e14a917,

title = "Data visualisation and exploration with prior knowledge",

abstract = "Visualising data for exploratory analysis is a major challenge in many applications. Visualisation allows scientists to gain insight into the structure and distribution of the data, for example finding common patterns and relationships between samples as well as variables. Typically, visualisation methods like principal component analysis and multi-dimensional scaling are employed. These methods are favoured because of their simplicity, but they cannot cope with missing data and it is difficult to incorporate prior knowledge about properties of the variable space into the analysis; this is particularly important in the high-dimensional, sparse datasets typical in geochemistry. In this paper we show how to utilise a block-structured correlation matrix using a modification of a well known non-linear probabilistic visualisation model, the Generative Topographic Mapping (GTM), which can cope with missing data. The block structure supports direct modelling of strongly correlated variables. We show that including prior structural information it is possible to improve both the data visualisation and the model fit. These benefits are demonstrated on artificial data as well as a real geochemical dataset used for oil exploration, where the proposed modifications improved the missing data imputation results by 3 to 13%.",

keywords = "visualising data, exploratory analysis, principal component analysis, multi-dimensional scaling, sparse datasets, geochemistry, block-structured correlation matrix, non-linear probabilistic visualisation model, Generative Topographic Mapping, geochemical dataset, oil exploration",

author = "Dan Cornford and Martin Schroeder and Nabney, {Ian T.}",

note = "The original publication is available at www.springerlink.com",

year = "2009",

month = aug,

day = "19",

doi = "10.1007/978-3-642-03969-0_13",

language = "English",

isbn = "978-3-642-03969-0",

volume = "43 CCIS",

series = "Communications in computer and information science",

publisher = "Springer",

pages = "113--142",

booktitle = "Engineering applications of neural networks",

address = "Germany",

}

TY - CHAP

T1 - Data visualisation and exploration with prior knowledge

AU - Cornford, Dan

AU - Schroeder, Martin

AU - Nabney, Ian T.

N1 - The original publication is available at www.springerlink.com

PY - 2009/8/19

Y1 - 2009/8/19

N2 - Visualising data for exploratory analysis is a major challenge in many applications. Visualisation allows scientists to gain insight into the structure and distribution of the data, for example finding common patterns and relationships between samples as well as variables. Typically, visualisation methods like principal component analysis and multi-dimensional scaling are employed. These methods are favoured because of their simplicity, but they cannot cope with missing data and it is difficult to incorporate prior knowledge about properties of the variable space into the analysis; this is particularly important in the high-dimensional, sparse datasets typical in geochemistry. In this paper we show how to utilise a block-structured correlation matrix using a modification of a well known non-linear probabilistic visualisation model, the Generative Topographic Mapping (GTM), which can cope with missing data. The block structure supports direct modelling of strongly correlated variables. We show that including prior structural information it is possible to improve both the data visualisation and the model fit. These benefits are demonstrated on artificial data as well as a real geochemical dataset used for oil exploration, where the proposed modifications improved the missing data imputation results by 3 to 13%.

AB - Visualising data for exploratory analysis is a major challenge in many applications. Visualisation allows scientists to gain insight into the structure and distribution of the data, for example finding common patterns and relationships between samples as well as variables. Typically, visualisation methods like principal component analysis and multi-dimensional scaling are employed. These methods are favoured because of their simplicity, but they cannot cope with missing data and it is difficult to incorporate prior knowledge about properties of the variable space into the analysis; this is particularly important in the high-dimensional, sparse datasets typical in geochemistry. In this paper we show how to utilise a block-structured correlation matrix using a modification of a well known non-linear probabilistic visualisation model, the Generative Topographic Mapping (GTM), which can cope with missing data. The block structure supports direct modelling of strongly correlated variables. We show that including prior structural information it is possible to improve both the data visualisation and the model fit. These benefits are demonstrated on artificial data as well as a real geochemical dataset used for oil exploration, where the proposed modifications improved the missing data imputation results by 3 to 13%.

KW - visualising data

KW - exploratory analysis

KW - principal component analysis

KW - multi-dimensional scaling

KW - sparse datasets

KW - geochemistry

KW - block-structured correlation matrix

KW - non-linear probabilistic visualisation model

KW - Generative Topographic Mapping

KW - geochemical dataset

KW - oil exploration

UR - http://www.scopus.com/inward/record.url?scp=78049377088&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-03969-0_13

DO - 10.1007/978-3-642-03969-0_13

M3 - Chapter (peer-reviewed)

SN - 978-3-642-03969-0

VL - 43 CCIS

T3 - Communications in computer and information science

SP - 113

EP - 142

BT - Engineering applications of neural networks

PB - Springer

CY - Beriln (DE)

ER -

Data visualisation and exploration with prior knowledge

Abstract

Publication series

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this