Outlier detection with partial information: Application to emergency mapping

Davide D'Alimonte*, Dan Cornford

*Corresponding author for this work

Research output: Contribution to journalArticle

Abstract

This paper, addresses the problem of novelty detection in the case that the observed data is a mixture of a known 'background' process contaminated with an unknown other process, which generates the outliers, or novel observations. The framework we describe here is quite general, employing univariate classification with incomplete information, based on knowledge of the distribution (the 'probability density function', 'pdf') of the data generated by the 'background' process. The relative proportion of this 'background' component (the 'prior' 'background' 'probability), the 'pdf' and the 'prior' probabilities of all other components are all assumed unknown. The main contribution is a new classification scheme that identifies the maximum proportion of observed data following the known 'background' distribution. The method exploits the Kolmogorov-Smirnov test to estimate the proportions, and afterwards data are Bayes optimally separated. Results, demonstrated with synthetic data, show that this approach can produce more reliable results than a standard novelty detection scheme. The classification algorithm is then applied to the problem of identifying outliers in the SIC2004 data set, in order to detect the radioactive release simulated in the 'oker' data set. We propose this method as a reliable means of novelty detection in the emergency situation which can also be used to identify outliers prior to the application of a more general automatic mapping algorithm. © Springer-Verlag 2007.

Original languageEnglish
Pages (from-to)613-620
Number of pages8
JournalStochastic environmental research and risk assessment
Volume22
Issue number5
Early online date30 Jun 2007
DOIs
Publication statusPublished - Aug 2008

Fingerprint

Outlier Detection
Partial Information
outlier
Emergency
Novelty Detection
Outlier
Proportion
Probability density function
Kolmogorov-Smirnov Test
Unknown
probability density function
Prior Probability
Incomplete Information
Bayes
Classification Algorithm
Synthetic Data
Univariate
detection
Background
Estimate

Bibliographical note

The original publication is available at www.springerlink.com

Keywords

  • novelty detection
  • known background proces
  • contamination
  • unknown process
  • outliers
  • novel observations
  • knowledge of the distribution
  • probability density function
  • pdf
  • prior ‘background’ probability
  • prior probabilities
  • Kolmogorov–Smirnov test
  • proportions
  • afterwards data
  • Bayes optimally separated
  • classification algorithm
  • SIC2004 data set
  • detection
  • radioactive release
  • ‘joker’ data set
  • emergency situation
  • automatic mapping algorithm

Cite this

@article{f5bb17680571452dbdd6d34231eb6231,
title = "Outlier detection with partial information: Application to emergency mapping",
abstract = "This paper, addresses the problem of novelty detection in the case that the observed data is a mixture of a known 'background' process contaminated with an unknown other process, which generates the outliers, or novel observations. The framework we describe here is quite general, employing univariate classification with incomplete information, based on knowledge of the distribution (the 'probability density function', 'pdf') of the data generated by the 'background' process. The relative proportion of this 'background' component (the 'prior' 'background' 'probability), the 'pdf' and the 'prior' probabilities of all other components are all assumed unknown. The main contribution is a new classification scheme that identifies the maximum proportion of observed data following the known 'background' distribution. The method exploits the Kolmogorov-Smirnov test to estimate the proportions, and afterwards data are Bayes optimally separated. Results, demonstrated with synthetic data, show that this approach can produce more reliable results than a standard novelty detection scheme. The classification algorithm is then applied to the problem of identifying outliers in the SIC2004 data set, in order to detect the radioactive release simulated in the 'oker' data set. We propose this method as a reliable means of novelty detection in the emergency situation which can also be used to identify outliers prior to the application of a more general automatic mapping algorithm. {\circledC} Springer-Verlag 2007.",
keywords = "novelty detection, known background proces, contamination, unknown process, outliers, novel observations, knowledge of the distribution, probability density function, pdf, prior ‘background’ probability, prior probabilities, Kolmogorov–Smirnov test, proportions, afterwards data, Bayes optimally separated, classification algorithm, SIC2004 data set, detection, radioactive release, ‘joker’ data set, emergency situation, automatic mapping algorithm",
author = "Davide D'Alimonte and Dan Cornford",
note = "The original publication is available at www.springerlink.com",
year = "2008",
month = "8",
doi = "10.1007/s00477-007-0164-8",
language = "English",
volume = "22",
pages = "613--620",
journal = "Stochastic environmental research and risk assessment",
issn = "1436-3240",
publisher = "Springer",
number = "5",

}

Outlier detection with partial information : Application to emergency mapping. / D'Alimonte, Davide; Cornford, Dan.

In: Stochastic environmental research and risk assessment, Vol. 22, No. 5, 08.2008, p. 613-620.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Outlier detection with partial information

T2 - Application to emergency mapping

AU - D'Alimonte, Davide

AU - Cornford, Dan

N1 - The original publication is available at www.springerlink.com

PY - 2008/8

Y1 - 2008/8

N2 - This paper, addresses the problem of novelty detection in the case that the observed data is a mixture of a known 'background' process contaminated with an unknown other process, which generates the outliers, or novel observations. The framework we describe here is quite general, employing univariate classification with incomplete information, based on knowledge of the distribution (the 'probability density function', 'pdf') of the data generated by the 'background' process. The relative proportion of this 'background' component (the 'prior' 'background' 'probability), the 'pdf' and the 'prior' probabilities of all other components are all assumed unknown. The main contribution is a new classification scheme that identifies the maximum proportion of observed data following the known 'background' distribution. The method exploits the Kolmogorov-Smirnov test to estimate the proportions, and afterwards data are Bayes optimally separated. Results, demonstrated with synthetic data, show that this approach can produce more reliable results than a standard novelty detection scheme. The classification algorithm is then applied to the problem of identifying outliers in the SIC2004 data set, in order to detect the radioactive release simulated in the 'oker' data set. We propose this method as a reliable means of novelty detection in the emergency situation which can also be used to identify outliers prior to the application of a more general automatic mapping algorithm. © Springer-Verlag 2007.

AB - This paper, addresses the problem of novelty detection in the case that the observed data is a mixture of a known 'background' process contaminated with an unknown other process, which generates the outliers, or novel observations. The framework we describe here is quite general, employing univariate classification with incomplete information, based on knowledge of the distribution (the 'probability density function', 'pdf') of the data generated by the 'background' process. The relative proportion of this 'background' component (the 'prior' 'background' 'probability), the 'pdf' and the 'prior' probabilities of all other components are all assumed unknown. The main contribution is a new classification scheme that identifies the maximum proportion of observed data following the known 'background' distribution. The method exploits the Kolmogorov-Smirnov test to estimate the proportions, and afterwards data are Bayes optimally separated. Results, demonstrated with synthetic data, show that this approach can produce more reliable results than a standard novelty detection scheme. The classification algorithm is then applied to the problem of identifying outliers in the SIC2004 data set, in order to detect the radioactive release simulated in the 'oker' data set. We propose this method as a reliable means of novelty detection in the emergency situation which can also be used to identify outliers prior to the application of a more general automatic mapping algorithm. © Springer-Verlag 2007.

KW - novelty detection

KW - known background proces

KW - contamination

KW - unknown process

KW - outliers

KW - novel observations

KW - knowledge of the distribution

KW - probability density function

KW - pdf

KW - prior ‘background’ probability

KW - prior probabilities

KW - Kolmogorov–Smirnov test

KW - proportions

KW - afterwards data

KW - Bayes optimally separated

KW - classification algorithm

KW - SIC2004 data set

KW - detection

KW - radioactive release

KW - ‘joker’ data set

KW - emergency situation

KW - automatic mapping algorithm

UR - http://www.scopus.com/inward/record.url?scp=45749124041&partnerID=8YFLogxK

U2 - 10.1007/s00477-007-0164-8

DO - 10.1007/s00477-007-0164-8

M3 - Article

VL - 22

SP - 613

EP - 620

JO - Stochastic environmental research and risk assessment

JF - Stochastic environmental research and risk assessment

SN - 1436-3240

IS - 5

ER -