Handling varying amounts of missing data when classifying mental-health risk levels

Sherine Nagy Saleh*, Christopher D. Buckingham

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

One of the main challenges of classifying clinical data is determining how to handle missing features. Most research favours imputing of missing values or neglecting records that include missing data, both of which can degrade accuracy when missing values exceed a certain level. In this research we propose a methodology to handle data sets with a large percentage of missing values and with high variability in which particular data are missing. Feature selection is effected by picking variables sequentially in order of maximum correlation with the dependent variable and minimum correlation with variables already selected. Classification models are generated individually for each test case based on its particular feature set and the matching data values available in the training population. The method was applied to real patients' anonymous mental-health data where the task was to predict the suicide risk judgement clinicians would give for each patient's data, with eleven possible outcome classes: zero to ten, representing no risk to maximum risk. The results compare favourably with alternative methods and have the advantage of ensuring explanations of risk are based only on the data given, not imputed data. This is important for clinical decision support systems using human expertise for modelling and explaining predictions.

Original languageEnglish
Title of host publicationInnovation in Medicine and healthcare 2014
EditorsManuel Graña, Carlos Toro, Robert J. Howlett, Lakhmi C. Jain
PublisherIOS
Pages92-101
Number of pages10
ISBN (Electronic)978-1-61499-474-9
ISBN (Print)978-1-61499-473-2
DOIs
Publication statusPublished - 31 Dec 2014
Event2nd KES international conference on Innovation in Medicine and healthcare - San Sebastian, Spain
Duration: 9 Jul 201411 Jul 2014

Publication series

NameStudies in health technology and informatics
PublisherIOP Press
Volume207
ISSN (Print)0926-9630
ISSN (Electronic)1879-8365

Conference

Conference2nd KES international conference on Innovation in Medicine and healthcare
Abbreviated titleInMed-14
CountrySpain
CitySan Sebastian
Period9/07/1411/07/14

Keywords

  • correlation
  • feature selection
  • mental health
  • missing data
  • partial correlation
  • risk prediction

Fingerprint Dive into the research topics of 'Handling varying amounts of missing data when classifying mental-health risk levels'. Together they form a unique fingerprint.

  • Research Output

    • 2 Conference contribution

    Representing human expertise by the OWL web ontology language to support knowledge engineering in decision support systems

    Ramzan, A., Wang, H. & Buckingham, C., 1 Jan 2014, Innovation in Medicine and healthcare 2014. Graña, M., Toro, C., Howlett, R. J. & Jain, L. C. (eds.). IOS, p. 290-299 10 p. (Studies in health technology and informatics; vol. 207).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • Understanding data collection behaviour of mental health practitioners

    Rezaei-Yazdi, A. & Buckingham, C. D., 31 Dec 2014, Innovation in Medicine and healthcare 2014. Graña, M., Toro, C., Howlett, R. J. & Jain, L. C. (eds.). IOS, p. 193-202 10 p. (Studies in health technology and informatics; vol. 207).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • Cite this

    Saleh, S. N., & Buckingham, C. D. (2014). Handling varying amounts of missing data when classifying mental-health risk levels. In M. Graña, C. Toro, R. J. Howlett, & L. C. Jain (Eds.), Innovation in Medicine and healthcare 2014 (pp. 92-101). (Studies in health technology and informatics; Vol. 207). IOS. https://doi.org/10.3233/978-1-61499-474-9-92