Reducing False Negatives in Ransomware Detection: A Critical Evaluation of Machine Learning Algorithms

Robert Bold; Haider Al-Khateeb; Nikolaos Ersotelos

doi:10.3390/app122412941

Reducing False Negatives in Ransomware Detection: A Critical Evaluation of Machine Learning Algorithms

Robert Bold, Haider Al-Khateeb, Nikolaos Ersotelos

Operations & Information Management

Research output: Contribution to journal › Article › peer-review

Abstract

Technological achievement and cybercriminal methodology are two parallel growing paths; protocols such as Tor and i2p (designed to offer confidentiality and anonymity) are being utilised to run ransomware companies operating under a Ransomware as a Service (RaaS) model. RaaS enables criminals with a limited technical ability to launch ransomware attacks. Several recent high-profile cases, such as the Colonial Pipeline attack and JBS Foods, involved forcing companies to pay enormous amounts of ransom money, indicating the difficulty for organisations of recovering from these attacks using traditional means, such as restoring backup systems. Hence, this is the benefit of intelligent early ransomware detection and eradication. This study offers a critical review of the literature on how we can use state-of-the-art machine learning (ML) models to detect ransomware. However, the results uncovered a tendency of previous works to report precision while overlooking the importance of other values in the confusion matrices, such as false negatives. Therefore, we also contribute a critical evaluation of ML models using a dataset of 730 malware and 735 benign samples to evaluate their suitability to mitigate ransomware at different stages of a detection system architecture and what that means in terms of cost. For example, the results have shown that an Artificial Neural Network (ANN) model will be the most suitable as it achieves the highest precision of 98.65%, a Youden’s index of 0.94, and a net benefit of 76.27%, however, the Random Forest model (lower precision of 92.73%) offered the benefit of having the lowest false-negative rate (0.00%). The risk of a false negative in this type of system is comparable to the unpredictable but typically large cost of ransomware infection, in comparison with the more predictable cost of the resources needed to filter false positives.

Original language	English
Article number	12941
Number of pages	22
Journal	Applied Sciences
Volume	12
Issue number	24
Early online date	16 Dec 2022
DOIs	https://doi.org/10.3390/app122412941
Publication status	Published - Dec 2022

Bibliographical note

© 2022 by the authors. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Keywords

artificial intelligence
incident response
cyber kill chain
destructive malware

Access to Document

10.3390/app122412941Licence: CC BY 4.0

Reducing_False_Negatives_in_Ransomware_Detection
© 2022 by the authors. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Final published version, 2.64 MBLicence: CC BY 4.0

Cite this

@article{2940d81e0ceb4fc1b9156350ba16d646,

title = "Reducing False Negatives in Ransomware Detection: A Critical Evaluation of Machine Learning Algorithms",

abstract = "Technological achievement and cybercriminal methodology are two parallel growing paths; protocols such as Tor and i2p (designed to offer confidentiality and anonymity) are being utilised to run ransomware companies operating under a Ransomware as a Service (RaaS) model. RaaS enables criminals with a limited technical ability to launch ransomware attacks. Several recent high-profile cases, such as the Colonial Pipeline attack and JBS Foods, involved forcing companies to pay enormous amounts of ransom money, indicating the difficulty for organisations of recovering from these attacks using traditional means, such as restoring backup systems. Hence, this is the benefit of intelligent early ransomware detection and eradication. This study offers a critical review of the literature on how we can use state-of-the-art machine learning (ML) models to detect ransomware. However, the results uncovered a tendency of previous works to report precision while overlooking the importance of other values in the confusion matrices, such as false negatives. Therefore, we also contribute a critical evaluation of ML models using a dataset of 730 malware and 735 benign samples to evaluate their suitability to mitigate ransomware at different stages of a detection system architecture and what that means in terms of cost. For example, the results have shown that an Artificial Neural Network (ANN) model will be the most suitable as it achieves the highest precision of 98.65%, a Youden{\textquoteright}s index of 0.94, and a net benefit of 76.27%, however, the Random Forest model (lower precision of 92.73%) offered the benefit of having the lowest false-negative rate (0.00%). The risk of a false negative in this type of system is comparable to the unpredictable but typically large cost of ransomware infection, in comparison with the more predictable cost of the resources needed to filter false positives.",

keywords = "artificial intelligence, incident response, cyber kill chain, destructive malware",

author = "Robert Bold and Haider Al-Khateeb and Nikolaos Ersotelos",

note = "{\textcopyright} 2022 by the authors. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).",

year = "2022",

month = dec,

doi = "10.3390/app122412941",

language = "English",

volume = "12",

journal = "Applied Sciences",

issn = "2076-3417",

publisher = "MDPI AG",

number = "24",

}

TY - JOUR

T1 - Reducing False Negatives in Ransomware Detection: A Critical Evaluation of Machine Learning Algorithms

AU - Bold, Robert

AU - Al-Khateeb, Haider

AU - Ersotelos, Nikolaos

N1 - © 2022 by the authors. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

PY - 2022/12

Y1 - 2022/12

N2 - Technological achievement and cybercriminal methodology are two parallel growing paths; protocols such as Tor and i2p (designed to offer confidentiality and anonymity) are being utilised to run ransomware companies operating under a Ransomware as a Service (RaaS) model. RaaS enables criminals with a limited technical ability to launch ransomware attacks. Several recent high-profile cases, such as the Colonial Pipeline attack and JBS Foods, involved forcing companies to pay enormous amounts of ransom money, indicating the difficulty for organisations of recovering from these attacks using traditional means, such as restoring backup systems. Hence, this is the benefit of intelligent early ransomware detection and eradication. This study offers a critical review of the literature on how we can use state-of-the-art machine learning (ML) models to detect ransomware. However, the results uncovered a tendency of previous works to report precision while overlooking the importance of other values in the confusion matrices, such as false negatives. Therefore, we also contribute a critical evaluation of ML models using a dataset of 730 malware and 735 benign samples to evaluate their suitability to mitigate ransomware at different stages of a detection system architecture and what that means in terms of cost. For example, the results have shown that an Artificial Neural Network (ANN) model will be the most suitable as it achieves the highest precision of 98.65%, a Youden’s index of 0.94, and a net benefit of 76.27%, however, the Random Forest model (lower precision of 92.73%) offered the benefit of having the lowest false-negative rate (0.00%). The risk of a false negative in this type of system is comparable to the unpredictable but typically large cost of ransomware infection, in comparison with the more predictable cost of the resources needed to filter false positives.

AB - Technological achievement and cybercriminal methodology are two parallel growing paths; protocols such as Tor and i2p (designed to offer confidentiality and anonymity) are being utilised to run ransomware companies operating under a Ransomware as a Service (RaaS) model. RaaS enables criminals with a limited technical ability to launch ransomware attacks. Several recent high-profile cases, such as the Colonial Pipeline attack and JBS Foods, involved forcing companies to pay enormous amounts of ransom money, indicating the difficulty for organisations of recovering from these attacks using traditional means, such as restoring backup systems. Hence, this is the benefit of intelligent early ransomware detection and eradication. This study offers a critical review of the literature on how we can use state-of-the-art machine learning (ML) models to detect ransomware. However, the results uncovered a tendency of previous works to report precision while overlooking the importance of other values in the confusion matrices, such as false negatives. Therefore, we also contribute a critical evaluation of ML models using a dataset of 730 malware and 735 benign samples to evaluate their suitability to mitigate ransomware at different stages of a detection system architecture and what that means in terms of cost. For example, the results have shown that an Artificial Neural Network (ANN) model will be the most suitable as it achieves the highest precision of 98.65%, a Youden’s index of 0.94, and a net benefit of 76.27%, however, the Random Forest model (lower precision of 92.73%) offered the benefit of having the lowest false-negative rate (0.00%). The risk of a false negative in this type of system is comparable to the unpredictable but typically large cost of ransomware infection, in comparison with the more predictable cost of the resources needed to filter false positives.

KW - artificial intelligence

KW - incident response

KW - cyber kill chain

KW - destructive malware

UR - https://www.mdpi.com/2076-3417/12/24/12941

UR - http://www.scopus.com/inward/record.url?scp=85144902750&partnerID=8YFLogxK

U2 - 10.3390/app122412941

DO - 10.3390/app122412941

M3 - Article

SN - 2076-3417

VL - 12

JO - Applied Sciences

JF - Applied Sciences

IS - 24

M1 - 12941

ER -

Reducing False Negatives in Ransomware Detection: A Critical Evaluation of Machine Learning Algorithms

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this