Handling varying amounts of missing data when classifying mental-health risk levels

Sherine Nagy Saleh; Christopher D. Buckingham

doi:10.3233/978-1-61499-474-9-92

Handling varying amounts of missing data when classifying mental-health risk levels

Sherine Nagy Saleh^*, Christopher D. Buckingham

^*Corresponding author for this work

Computer Science Research Group

Research output: Chapter in Book/Published conference output › Conference publication

Abstract

One of the main challenges of classifying clinical data is determining how to handle missing features. Most research favours imputing of missing values or neglecting records that include missing data, both of which can degrade accuracy when missing values exceed a certain level. In this research we propose a methodology to handle data sets with a large percentage of missing values and with high variability in which particular data are missing. Feature selection is effected by picking variables sequentially in order of maximum correlation with the dependent variable and minimum correlation with variables already selected. Classification models are generated individually for each test case based on its particular feature set and the matching data values available in the training population. The method was applied to real patients' anonymous mental-health data where the task was to predict the suicide risk judgement clinicians would give for each patient's data, with eleven possible outcome classes: zero to ten, representing no risk to maximum risk. The results compare favourably with alternative methods and have the advantage of ensuring explanations of risk are based only on the data given, not imputed data. This is important for clinical decision support systems using human expertise for modelling and explaining predictions.

Original language	English
Title of host publication	Innovation in Medicine and healthcare 2014
Editors	Manuel Graña, Carlos Toro, Robert J. Howlett, Lakhmi C. Jain
Publisher	IOS
Pages	92-101
Number of pages	10
ISBN (Electronic)	978-1-61499-474-9
ISBN (Print)	978-1-61499-473-2
DOIs	https://doi.org/10.3233/978-1-61499-474-9-92
Publication status	Published - 31 Dec 2014
Event	2nd KES international conference on Innovation in Medicine and healthcare - San Sebastian, Spain Duration: 9 Jul 2014 → 11 Jul 2014

Publication series

Name	Studies in health technology and informatics
Publisher	IOP Press
Volume	207
ISSN (Print)	0926-9630
ISSN (Electronic)	1879-8365

Conference

Conference	2nd KES international conference on Innovation in Medicine and healthcare
Abbreviated title	InMed-14
Country/Territory	Spain
City	San Sebastian
Period	9/07/14 → 11/07/14

Keywords

correlation
feature selection
mental health
missing data
partial correlation
risk prediction

Access to Document

10.3233/978-1-61499-474-9-92

Cite this

@inproceedings{f25efee713434c82b959daec455bad85,

title = "Handling varying amounts of missing data when classifying mental-health risk levels",

abstract = "One of the main challenges of classifying clinical data is determining how to handle missing features. Most research favours imputing of missing values or neglecting records that include missing data, both of which can degrade accuracy when missing values exceed a certain level. In this research we propose a methodology to handle data sets with a large percentage of missing values and with high variability in which particular data are missing. Feature selection is effected by picking variables sequentially in order of maximum correlation with the dependent variable and minimum correlation with variables already selected. Classification models are generated individually for each test case based on its particular feature set and the matching data values available in the training population. The method was applied to real patients' anonymous mental-health data where the task was to predict the suicide risk judgement clinicians would give for each patient's data, with eleven possible outcome classes: zero to ten, representing no risk to maximum risk. The results compare favourably with alternative methods and have the advantage of ensuring explanations of risk are based only on the data given, not imputed data. This is important for clinical decision support systems using human expertise for modelling and explaining predictions.",

keywords = "correlation, feature selection, mental health, missing data, partial correlation, risk prediction",

author = "Saleh, {Sherine Nagy} and Buckingham, {Christopher D.}",

year = "2014",

month = dec,

day = "31",

doi = "10.3233/978-1-61499-474-9-92",

language = "English",

isbn = "978-1-61499-473-2",

series = "Studies in health technology and informatics",

publisher = "IOS",

pages = "92--101",

editor = "Manuel Gra{\~n}a and Carlos Toro and Howlett, {Robert J.} and Jain, {Lakhmi C.}",

booktitle = "Innovation in Medicine and healthcare 2014",

address = "Netherlands",

note = "2nd KES international conference on Innovation in Medicine and healthcare, InMed-14 ; Conference date: 09-07-2014 Through 11-07-2014",

}

Saleh, SN & Buckingham, CD 2014, Handling varying amounts of missing data when classifying mental-health risk levels. in M Graña, C Toro, RJ Howlett & LC Jain (eds), Innovation in Medicine and healthcare 2014. Studies in health technology and informatics, vol. 207, IOS, pp. 92-101, 2nd KES international conference on Innovation in Medicine and healthcare, San Sebastian, Spain, 9/07/14. https://doi.org/10.3233/978-1-61499-474-9-92

Handling varying amounts of missing data when classifying mental-health risk levels. / Saleh, Sherine Nagy; Buckingham, Christopher D.
Innovation in Medicine and healthcare 2014. ed. / Manuel Graña; Carlos Toro; Robert J. Howlett; Lakhmi C. Jain. IOS, 2014. p. 92-101 (Studies in health technology and informatics; Vol. 207).

Research output: Chapter in Book/Published conference output › Conference publication

TY - GEN

T1 - Handling varying amounts of missing data when classifying mental-health risk levels

AU - Saleh, Sherine Nagy

AU - Buckingham, Christopher D.

PY - 2014/12/31

Y1 - 2014/12/31

N2 - One of the main challenges of classifying clinical data is determining how to handle missing features. Most research favours imputing of missing values or neglecting records that include missing data, both of which can degrade accuracy when missing values exceed a certain level. In this research we propose a methodology to handle data sets with a large percentage of missing values and with high variability in which particular data are missing. Feature selection is effected by picking variables sequentially in order of maximum correlation with the dependent variable and minimum correlation with variables already selected. Classification models are generated individually for each test case based on its particular feature set and the matching data values available in the training population. The method was applied to real patients' anonymous mental-health data where the task was to predict the suicide risk judgement clinicians would give for each patient's data, with eleven possible outcome classes: zero to ten, representing no risk to maximum risk. The results compare favourably with alternative methods and have the advantage of ensuring explanations of risk are based only on the data given, not imputed data. This is important for clinical decision support systems using human expertise for modelling and explaining predictions.

AB - One of the main challenges of classifying clinical data is determining how to handle missing features. Most research favours imputing of missing values or neglecting records that include missing data, both of which can degrade accuracy when missing values exceed a certain level. In this research we propose a methodology to handle data sets with a large percentage of missing values and with high variability in which particular data are missing. Feature selection is effected by picking variables sequentially in order of maximum correlation with the dependent variable and minimum correlation with variables already selected. Classification models are generated individually for each test case based on its particular feature set and the matching data values available in the training population. The method was applied to real patients' anonymous mental-health data where the task was to predict the suicide risk judgement clinicians would give for each patient's data, with eleven possible outcome classes: zero to ten, representing no risk to maximum risk. The results compare favourably with alternative methods and have the advantage of ensuring explanations of risk are based only on the data given, not imputed data. This is important for clinical decision support systems using human expertise for modelling and explaining predictions.

KW - correlation

KW - feature selection

KW - mental health

KW - missing data

KW - partial correlation

KW - risk prediction

UR - http://www.scopus.com/inward/record.url?scp=84918786571&partnerID=8YFLogxK

UR - http://ebooks.iospress.nl/publication/38627

U2 - 10.3233/978-1-61499-474-9-92

DO - 10.3233/978-1-61499-474-9-92

M3 - Conference publication

AN - SCOPUS:84918786571

SN - 978-1-61499-473-2

T3 - Studies in health technology and informatics

SP - 92

EP - 101

BT - Innovation in Medicine and healthcare 2014

A2 - Graña, Manuel

A2 - Toro, Carlos

A2 - Howlett, Robert J.

A2 - Jain, Lakhmi C.

PB - IOS

T2 - 2nd KES international conference on Innovation in Medicine and healthcare

Y2 - 9 July 2014 through 11 July 2014

ER -

Handling varying amounts of missing data when classifying mental-health risk levels

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Representing human expertise by the OWL web ontology language to support knowledge engineering in decision support systems

Understanding data collection behaviour of mental health practitioners

Cite this