Addressing missing data in geochemistry: a non-linear approach

Martin Schroeder, Dan Cornford, Paul Farrimond, Chris Cornford

Research output: Contribution to journalArticle

Abstract

Exploratory analysis of petroleum geochemical data seeks to find common patterns to help distinguish between different source rocks, oils and gases, and to explain their source, maturity and any intra-reservoir alteration. However, at the outset, one is typically faced with (a) a large matrix of samples, each with a range of molecular and isotopic properties, (b) a spatially and temporally unrepresentative sampling pattern, (c) noisy data and (d) often, a large number of missing values. This inhibits analysis using conventional statistical methods. Typically, visualisation methods like principal components analysis are used, but these methods are not easily able to deal with missing data nor can they capture non-linear structure in the data. One approach to discovering complex, non-linear structure in the data is through the use of linked plots, or brushing, while ignoring the missing data. In this paper we introduce a complementary approach based on a non-linear probabilistic model. Generative topographic mapping enables the visualisation of the effects of very many variables on a single plot, while also dealing with missing data. We show how using generative topographic mapping also provides an optimal method with which to replace missing values in two geochemical datasets, particularly where a large proportion of the data is missing.
Original languageEnglish
Pages (from-to)1162-1169
Number of pages8
JournalOrganic Geochemistry
Volume39
Issue number8
DOIs
Publication statusPublished - Aug 2008

Fingerprint

Geochemistry
Visualization
geochemistry
Petroleum
Principal component analysis
Statistical methods
Oils
Gases
Rocks
Sampling
topographic mapping
visualization
source rock
principal component analysis
petroleum
matrix
Statistical Models
method
oil
sampling

Bibliographical note

Advances in Organic Geochemistry 2007 — Proceedings of the 23rd International Meeting on Organic Geochemistry

Keywords

  • petroleum geochemical
  • range of molecular and isotopic properties
  • spatially and temporally unrepresentative sampling pattern
  • linked plots
  • brushing
  • non-linear probabilistic model
  • Generative topographic mapping

Cite this

Schroeder, M., Cornford, D., Farrimond, P., & Cornford, C. (2008). Addressing missing data in geochemistry: a non-linear approach. Organic Geochemistry, 39(8), 1162-1169. https://doi.org/10.1016/j.orggeochem.2008.02.016
Schroeder, Martin ; Cornford, Dan ; Farrimond, Paul ; Cornford, Chris. / Addressing missing data in geochemistry: a non-linear approach. In: Organic Geochemistry. 2008 ; Vol. 39, No. 8. pp. 1162-1169.
@article{85776a6d99114505a3f70a7f51d6dd7f,
title = "Addressing missing data in geochemistry: a non-linear approach",
abstract = "Exploratory analysis of petroleum geochemical data seeks to find common patterns to help distinguish between different source rocks, oils and gases, and to explain their source, maturity and any intra-reservoir alteration. However, at the outset, one is typically faced with (a) a large matrix of samples, each with a range of molecular and isotopic properties, (b) a spatially and temporally unrepresentative sampling pattern, (c) noisy data and (d) often, a large number of missing values. This inhibits analysis using conventional statistical methods. Typically, visualisation methods like principal components analysis are used, but these methods are not easily able to deal with missing data nor can they capture non-linear structure in the data. One approach to discovering complex, non-linear structure in the data is through the use of linked plots, or brushing, while ignoring the missing data. In this paper we introduce a complementary approach based on a non-linear probabilistic model. Generative topographic mapping enables the visualisation of the effects of very many variables on a single plot, while also dealing with missing data. We show how using generative topographic mapping also provides an optimal method with which to replace missing values in two geochemical datasets, particularly where a large proportion of the data is missing.",
keywords = "petroleum geochemical, range of molecular and isotopic properties, spatially and temporally unrepresentative sampling pattern, linked plots, brushing, non-linear probabilistic model, Generative topographic mapping",
author = "Martin Schroeder and Dan Cornford and Paul Farrimond and Chris Cornford",
note = "Advances in Organic Geochemistry 2007 — Proceedings of the 23rd International Meeting on Organic Geochemistry",
year = "2008",
month = "8",
doi = "10.1016/j.orggeochem.2008.02.016",
language = "English",
volume = "39",
pages = "1162--1169",
journal = "Organic Geochemistry",
issn = "0146-6380",
publisher = "Elsevier",
number = "8",

}

Schroeder, M, Cornford, D, Farrimond, P & Cornford, C 2008, 'Addressing missing data in geochemistry: a non-linear approach', Organic Geochemistry, vol. 39, no. 8, pp. 1162-1169. https://doi.org/10.1016/j.orggeochem.2008.02.016

Addressing missing data in geochemistry: a non-linear approach. / Schroeder, Martin; Cornford, Dan; Farrimond, Paul; Cornford, Chris.

In: Organic Geochemistry, Vol. 39, No. 8, 08.2008, p. 1162-1169.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Addressing missing data in geochemistry: a non-linear approach

AU - Schroeder, Martin

AU - Cornford, Dan

AU - Farrimond, Paul

AU - Cornford, Chris

N1 - Advances in Organic Geochemistry 2007 — Proceedings of the 23rd International Meeting on Organic Geochemistry

PY - 2008/8

Y1 - 2008/8

N2 - Exploratory analysis of petroleum geochemical data seeks to find common patterns to help distinguish between different source rocks, oils and gases, and to explain their source, maturity and any intra-reservoir alteration. However, at the outset, one is typically faced with (a) a large matrix of samples, each with a range of molecular and isotopic properties, (b) a spatially and temporally unrepresentative sampling pattern, (c) noisy data and (d) often, a large number of missing values. This inhibits analysis using conventional statistical methods. Typically, visualisation methods like principal components analysis are used, but these methods are not easily able to deal with missing data nor can they capture non-linear structure in the data. One approach to discovering complex, non-linear structure in the data is through the use of linked plots, or brushing, while ignoring the missing data. In this paper we introduce a complementary approach based on a non-linear probabilistic model. Generative topographic mapping enables the visualisation of the effects of very many variables on a single plot, while also dealing with missing data. We show how using generative topographic mapping also provides an optimal method with which to replace missing values in two geochemical datasets, particularly where a large proportion of the data is missing.

AB - Exploratory analysis of petroleum geochemical data seeks to find common patterns to help distinguish between different source rocks, oils and gases, and to explain their source, maturity and any intra-reservoir alteration. However, at the outset, one is typically faced with (a) a large matrix of samples, each with a range of molecular and isotopic properties, (b) a spatially and temporally unrepresentative sampling pattern, (c) noisy data and (d) often, a large number of missing values. This inhibits analysis using conventional statistical methods. Typically, visualisation methods like principal components analysis are used, but these methods are not easily able to deal with missing data nor can they capture non-linear structure in the data. One approach to discovering complex, non-linear structure in the data is through the use of linked plots, or brushing, while ignoring the missing data. In this paper we introduce a complementary approach based on a non-linear probabilistic model. Generative topographic mapping enables the visualisation of the effects of very many variables on a single plot, while also dealing with missing data. We show how using generative topographic mapping also provides an optimal method with which to replace missing values in two geochemical datasets, particularly where a large proportion of the data is missing.

KW - petroleum geochemical

KW - range of molecular and isotopic properties

KW - spatially and temporally unrepresentative sampling pattern

KW - linked plots

KW - brushing

KW - non-linear probabilistic model

KW - Generative topographic mapping

UR - http://www.scopus.com/inward/record.url?scp=45449111735&partnerID=8YFLogxK

U2 - 10.1016/j.orggeochem.2008.02.016

DO - 10.1016/j.orggeochem.2008.02.016

M3 - Article

VL - 39

SP - 1162

EP - 1169

JO - Organic Geochemistry

JF - Organic Geochemistry

SN - 0146-6380

IS - 8

ER -