Abstract
In the cybersecurity industry, where legitimate transactions far outnumber fraudulent ones, detecting fraud is of paramount significance. In order to evaluate the accuracy of detecting fraudulent transactions in imbalanced real datasets, this study compares the efficacy of two approaches, random under-sampling and oversampling, using the synthetic minority over-sampling technique (SMOTE). Random under-sampling aims for fairness by excluding examples from the majority class, but this compromises precision in favor of recall. To strike a balance and ensure statistical significance, SMOTE was used instead to produce artificial examples of the minority class. Based on the data obtained, it is clear that random under-sampling achieves high recall (92.86%) at the expense of low precision, whereas SMOTE achieves a higher accuracy (86.75%) and a more even F1 score (73.47%) at the expense of a slightly lower recall. As true fraudulent transactions require at least two methods for verification, we investigated different machine learning methods and made suitable balances between accuracy, F1 score, and recall. Our comparison sheds light on the subtleties and ramifications of each approach, allowing professionals in the field of cybersecurity to better choose the approach that best meets the needs of their own firm. This research highlights the need to resolve class imbalances for effective fraud detection in cybersecurity, as well as the need for constant monitoring and the investigation of new approaches to increase applicability.
Original language | English |
---|---|
Article number | 478 |
Number of pages | 20 |
Journal | Information |
Volume | 15 |
Issue number | 8 |
Early online date | 12 Aug 2024 |
DOIs | |
Publication status | Published - Aug 2024 |
Bibliographical note
Copyright © 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).Keywords
- machine learning
- fraud detection
- synthetic minority over-sampling technique (SMOTE)
- under-sampling