On the Feature Selection Methods and Reject Option Classifiers for Robust Cancer Prediction

Muhammad Hammad Waseem; Malik Sajjad Ahmed Nadeem; Assad Abbas; Aliya Shaheen; Wajid Aziz; Adeel Anjum; Umar Manzoor; Muhammad A. Balubaid; Seong O. Shim

doi:10.1109/ACCESS.2019.2944295

On the Feature Selection Methods and Reject Option Classifiers for Robust Cancer Prediction

Muhammad Hammad Waseem, Malik Sajjad Ahmed Nadeem, Assad Abbas^*, Aliya Shaheen, Wajid Aziz, Adeel Anjum, Umar Manzoor, Muhammad A. Balubaid, Seong O. Shim

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

Abstract

Cancer is the second leading cause of mortality across the globe. Approximately 9.6 million people are estimated to have died due to cancer disease in 2019. Accurate and early prediction of cancer can assist healthcare professionals to devise timely therapeutic innervations to control sufferings and the risk of mortality. Generally, a machine learning (ML) based predictive system in healthcare uses data (genetic profile or clinical parameters) and learning algorithms to predict target values for cancer detection. However, optimization of predictive accuracy is an important endeavor for accurate decision making. Reject Option (RO) classifiers have been used to improve the predictive accuracy of classifiers for cancer like complex problems. In a gene profile all of the features are not important and should be shaved off. ML offers different techniques with their own methodology for feature selection (FS) and the classification results are dependent on the datasets each having its own distribution and features. Therefore, both FS methods and ML algorithms with RO need to be considered for robust classification. The main objective of this study is to optimize three parameters (learning algorithm, FS method and rejection rate) for robust cancer prediction rather than considering two traditional parameters (learning algorithm and rejection rate). The analysis of different FS methods (including t-Test, Las Vegas Filter (LVF), Relief, and Information Gain (IG)) and RO classifiers on different rejection thresholds is performed to investigate the robust predictability of cancer. The three cancer datasets (Colon cancer, Leukemia and Breast cancer) were reduced using different FS methods and each of them were used to analyze the predictability of cancer using different RO classifiers. The results reveal that for each dataset predictive accuracies of RO classifiers were different for different FS methods. The findings based on proposed scheme indicate that, the ML algorithms along with their dependence on suitable FS methods need to be taken into consideration for accurate prediction.

Original language	English
Pages (from-to)	141072-141082
Number of pages	11
Journal	IEEE Access
Volume	7
Early online date	27 Sept 2019
DOIs	https://doi.org/10.1109/ACCESS.2019.2944295
Publication status	Published - 9 Oct 2019

Bibliographical note

Keywords

Cancer
classification
feature selection
genetic profile
machine learning
reject option

Access to Document

10.1109/ACCESS.2019.2944295

On_the_Feature_Selection_Methods_and_Reject_Option_Classifiers_for_Robust_Cancer_Prediction
© 2019. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/.
Final published version, 7.84 MBLicence: CC BY 4.0

Cite this

@article{d163166dfd9b4e8eaab9c9ac87c9dcd4,

title = "On the Feature Selection Methods and Reject Option Classifiers for Robust Cancer Prediction",

abstract = "Cancer is the second leading cause of mortality across the globe. Approximately 9.6 million people are estimated to have died due to cancer disease in 2019. Accurate and early prediction of cancer can assist healthcare professionals to devise timely therapeutic innervations to control sufferings and the risk of mortality. Generally, a machine learning (ML) based predictive system in healthcare uses data (genetic profile or clinical parameters) and learning algorithms to predict target values for cancer detection. However, optimization of predictive accuracy is an important endeavor for accurate decision making. Reject Option (RO) classifiers have been used to improve the predictive accuracy of classifiers for cancer like complex problems. In a gene profile all of the features are not important and should be shaved off. ML offers different techniques with their own methodology for feature selection (FS) and the classification results are dependent on the datasets each having its own distribution and features. Therefore, both FS methods and ML algorithms with RO need to be considered for robust classification. The main objective of this study is to optimize three parameters (learning algorithm, FS method and rejection rate) for robust cancer prediction rather than considering two traditional parameters (learning algorithm and rejection rate). The analysis of different FS methods (including t-Test, Las Vegas Filter (LVF), Relief, and Information Gain (IG)) and RO classifiers on different rejection thresholds is performed to investigate the robust predictability of cancer. The three cancer datasets (Colon cancer, Leukemia and Breast cancer) were reduced using different FS methods and each of them were used to analyze the predictability of cancer using different RO classifiers. The results reveal that for each dataset predictive accuracies of RO classifiers were different for different FS methods. The findings based on proposed scheme indicate that, the ML algorithms along with their dependence on suitable FS methods need to be taken into consideration for accurate prediction.",

keywords = "Cancer, classification, feature selection, genetic profile, machine learning, reject option",

author = "Waseem, {Muhammad Hammad} and Nadeem, {Malik Sajjad Ahmed} and Assad Abbas and Aliya Shaheen and Wajid Aziz and Adeel Anjum and Umar Manzoor and Balubaid, {Muhammad A.} and Shim, {Seong O.}",

note = "{\textcopyright} 2019. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ .",

year = "2019",

month = oct,

day = "9",

doi = "10.1109/ACCESS.2019.2944295",

language = "English",

volume = "7",

pages = "141072--141082",

journal = "IEEE Access",

issn = "2169-3536",

publisher = "IEEE",

}

TY - JOUR

T1 - On the Feature Selection Methods and Reject Option Classifiers for Robust Cancer Prediction

AU - Waseem, Muhammad Hammad

AU - Nadeem, Malik Sajjad Ahmed

AU - Abbas, Assad

AU - Shaheen, Aliya

AU - Aziz, Wajid

AU - Anjum, Adeel

AU - Manzoor, Umar

AU - Balubaid, Muhammad A.

AU - Shim, Seong O.

PY - 2019/10/9

Y1 - 2019/10/9

N2 - Cancer is the second leading cause of mortality across the globe. Approximately 9.6 million people are estimated to have died due to cancer disease in 2019. Accurate and early prediction of cancer can assist healthcare professionals to devise timely therapeutic innervations to control sufferings and the risk of mortality. Generally, a machine learning (ML) based predictive system in healthcare uses data (genetic profile or clinical parameters) and learning algorithms to predict target values for cancer detection. However, optimization of predictive accuracy is an important endeavor for accurate decision making. Reject Option (RO) classifiers have been used to improve the predictive accuracy of classifiers for cancer like complex problems. In a gene profile all of the features are not important and should be shaved off. ML offers different techniques with their own methodology for feature selection (FS) and the classification results are dependent on the datasets each having its own distribution and features. Therefore, both FS methods and ML algorithms with RO need to be considered for robust classification. The main objective of this study is to optimize three parameters (learning algorithm, FS method and rejection rate) for robust cancer prediction rather than considering two traditional parameters (learning algorithm and rejection rate). The analysis of different FS methods (including t-Test, Las Vegas Filter (LVF), Relief, and Information Gain (IG)) and RO classifiers on different rejection thresholds is performed to investigate the robust predictability of cancer. The three cancer datasets (Colon cancer, Leukemia and Breast cancer) were reduced using different FS methods and each of them were used to analyze the predictability of cancer using different RO classifiers. The results reveal that for each dataset predictive accuracies of RO classifiers were different for different FS methods. The findings based on proposed scheme indicate that, the ML algorithms along with their dependence on suitable FS methods need to be taken into consideration for accurate prediction.

AB - Cancer is the second leading cause of mortality across the globe. Approximately 9.6 million people are estimated to have died due to cancer disease in 2019. Accurate and early prediction of cancer can assist healthcare professionals to devise timely therapeutic innervations to control sufferings and the risk of mortality. Generally, a machine learning (ML) based predictive system in healthcare uses data (genetic profile or clinical parameters) and learning algorithms to predict target values for cancer detection. However, optimization of predictive accuracy is an important endeavor for accurate decision making. Reject Option (RO) classifiers have been used to improve the predictive accuracy of classifiers for cancer like complex problems. In a gene profile all of the features are not important and should be shaved off. ML offers different techniques with their own methodology for feature selection (FS) and the classification results are dependent on the datasets each having its own distribution and features. Therefore, both FS methods and ML algorithms with RO need to be considered for robust classification. The main objective of this study is to optimize three parameters (learning algorithm, FS method and rejection rate) for robust cancer prediction rather than considering two traditional parameters (learning algorithm and rejection rate). The analysis of different FS methods (including t-Test, Las Vegas Filter (LVF), Relief, and Information Gain (IG)) and RO classifiers on different rejection thresholds is performed to investigate the robust predictability of cancer. The three cancer datasets (Colon cancer, Leukemia and Breast cancer) were reduced using different FS methods and each of them were used to analyze the predictability of cancer using different RO classifiers. The results reveal that for each dataset predictive accuracies of RO classifiers were different for different FS methods. The findings based on proposed scheme indicate that, the ML algorithms along with their dependence on suitable FS methods need to be taken into consideration for accurate prediction.

KW - Cancer

KW - classification

KW - feature selection

KW - genetic profile

KW - machine learning

KW - reject option

UR - https://ieeexplore.ieee.org/document/8851142

U2 - 10.1109/ACCESS.2019.2944295

DO - 10.1109/ACCESS.2019.2944295

M3 - Article

AN - SCOPUS:85077753500

SN - 2169-3536

VL - 7

SP - 141072

EP - 141082

JO - IEEE Access

JF - IEEE Access

ER -

On the Feature Selection Methods and Reject Option Classifiers for Robust Cancer Prediction

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this