Pima Indians diabetes mellitus classification based on machine learning (ML) algorithms

Victor Chang; Jozeene Bailey; Qianwen Ariel Xu; Zhili Sun

doi:10.1007/s00521-022-07049-z

Pima Indians diabetes mellitus classification based on machine learning (ML) algorithms

Victor Chang^*, Jozeene Bailey, Qianwen Ariel Xu, Zhili Sun

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

Abstract

This paper proposes an e-diagnosis system based on machine learning (ML) algorithms to be implemented on the Internet of Medical Things (IoMT) environment, particularly for diagnosing diabetes mellitus (type 2 diabetes). However, the ML applications tend to be mistrusted because of their inability to show the internal decision-making process, resulting in slow uptake by end-users within certain healthcare sectors. This research delineates the use of three interpretable supervised ML models: Naïve Bayes classifier, random forest classifier, and J48 decision tree models to be trained and tested using the Pima Indians diabetes dataset in R programming language. The performance of each algorithm is analyzed to determine the one with the best accuracy, precision, sensitivity, and specificity. An assessment of the decision process is also made to improve the model. It can be concluded that a Naïve Bayes model works well with a more fine-tuned selection of features for binary classification, while random forest works better with more features.

Original language	English
Journal	Neural Computing and Applications
Early online date	24 Mar 2022
DOIs	https://doi.org/10.1007/s00521-022-07049-z
Publication status	E-pub ahead of print - 24 Mar 2022

Bibliographical note

© 2022, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature. This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use [https://www.springernature.com/gp/open-research/policies/accepted-manuscript-terms], but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/s00521-022-07049-z

Funding Information:
This research is partly supported by VC Research (VCR 0000159) for Prof Chang.

Keywords

Diabetes mellitus
Interpretable artificial intelligence
Machine learning
The Internet of Medical Things (IoMT)

Access to Document

10.1007/s00521-022-07049-z

VC_ML_NCAA2021_final
© 2022, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature. This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use [https://www.springernature.com/gp/open-research/policies/accepted-manuscript-terms], but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/s00521-022-07049-z
Accepted author manuscript, 991 KBLicence: Other

Cite this

@article{27bc8b3f9d6e44a3a8123e961b465fcb,

title = "Pima Indians diabetes mellitus classification based on machine learning (ML) algorithms",

abstract = "This paper proposes an e-diagnosis system based on machine learning (ML) algorithms to be implemented on the Internet of Medical Things (IoMT) environment, particularly for diagnosing diabetes mellitus (type 2 diabetes). However, the ML applications tend to be mistrusted because of their inability to show the internal decision-making process, resulting in slow uptake by end-users within certain healthcare sectors. This research delineates the use of three interpretable supervised ML models: Na{\"i}ve Bayes classifier, random forest classifier, and J48 decision tree models to be trained and tested using the Pima Indians diabetes dataset in R programming language. The performance of each algorithm is analyzed to determine the one with the best accuracy, precision, sensitivity, and specificity. An assessment of the decision process is also made to improve the model. It can be concluded that a Na{\"i}ve Bayes model works well with a more fine-tuned selection of features for binary classification, while random forest works better with more features.",

keywords = "Diabetes mellitus, Interpretable artificial intelligence, Machine learning, The Internet of Medical Things (IoMT)",

author = "Victor Chang and Jozeene Bailey and Xu, {Qianwen Ariel} and Zhili Sun",

note = "{\textcopyright} 2022, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature. This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature{\textquoteright}s AM terms of use [https://www.springernature.com/gp/open-research/policies/accepted-manuscript-terms], but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/s00521-022-07049-z Funding Information: This research is partly supported by VC Research (VCR 0000159) for Prof Chang. ",

year = "2022",

month = mar,

day = "24",

doi = "10.1007/s00521-022-07049-z",

language = "English",

journal = "Neural Computing and Applications",

issn = "0941-0643",

publisher = "Springer",

}

TY - JOUR

T1 - Pima Indians diabetes mellitus classification based on machine learning (ML) algorithms

AU - Chang, Victor

AU - Bailey, Jozeene

AU - Xu, Qianwen Ariel

AU - Sun, Zhili

N1 - © 2022, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature. This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use [https://www.springernature.com/gp/open-research/policies/accepted-manuscript-terms], but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/s00521-022-07049-z Funding Information: This research is partly supported by VC Research (VCR 0000159) for Prof Chang.

PY - 2022/3/24

Y1 - 2022/3/24

N2 - This paper proposes an e-diagnosis system based on machine learning (ML) algorithms to be implemented on the Internet of Medical Things (IoMT) environment, particularly for diagnosing diabetes mellitus (type 2 diabetes). However, the ML applications tend to be mistrusted because of their inability to show the internal decision-making process, resulting in slow uptake by end-users within certain healthcare sectors. This research delineates the use of three interpretable supervised ML models: Naïve Bayes classifier, random forest classifier, and J48 decision tree models to be trained and tested using the Pima Indians diabetes dataset in R programming language. The performance of each algorithm is analyzed to determine the one with the best accuracy, precision, sensitivity, and specificity. An assessment of the decision process is also made to improve the model. It can be concluded that a Naïve Bayes model works well with a more fine-tuned selection of features for binary classification, while random forest works better with more features.

AB - This paper proposes an e-diagnosis system based on machine learning (ML) algorithms to be implemented on the Internet of Medical Things (IoMT) environment, particularly for diagnosing diabetes mellitus (type 2 diabetes). However, the ML applications tend to be mistrusted because of their inability to show the internal decision-making process, resulting in slow uptake by end-users within certain healthcare sectors. This research delineates the use of three interpretable supervised ML models: Naïve Bayes classifier, random forest classifier, and J48 decision tree models to be trained and tested using the Pima Indians diabetes dataset in R programming language. The performance of each algorithm is analyzed to determine the one with the best accuracy, precision, sensitivity, and specificity. An assessment of the decision process is also made to improve the model. It can be concluded that a Naïve Bayes model works well with a more fine-tuned selection of features for binary classification, while random forest works better with more features.

KW - Diabetes mellitus

KW - Interpretable artificial intelligence

KW - Machine learning

KW - The Internet of Medical Things (IoMT)

UR - http://www.scopus.com/inward/record.url?scp=85127576654&partnerID=8YFLogxK

UR - https://link.springer.com/article/10.1007/s00521-022-07049-z

U2 - 10.1007/s00521-022-07049-z

DO - 10.1007/s00521-022-07049-z

M3 - Article

C2 - 35345556

AN - SCOPUS:85127576654

SN - 0941-0643

JO - Neural Computing and Applications

JF - Neural Computing and Applications

ER -

Pima Indians diabetes mellitus classification based on machine learning (ML) algorithms

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this