Credit Card Fraud Detection for Contemporary Financial Management Using XGBoost-Driven Machine Learning and Data Augmentation Techniques

Teuku Rizky Noviandy; Ghalieb Mutig Idroes; Aga Maulana; Irsan Hardi; Edi Saputra Ringga; Rinaldi Idroes

doi:10.60084/ijma.v1i1.78

Authors

Teuku Rizky Noviandy Department of Informatics, Faculty of Mathematics and Natural Sciences, Universitas Syiah Kuala, Banda Aceh 23111, Indonesia
Ghalieb Mutig Idroes Energy and Green Economics Unit, Graha Primera Saintifika, Aceh Besar 23371, Indonesia
Aga Maulana Department of Informatics, Faculty of Mathematics and Natural Sciences, Universitas Syiah Kuala, Banda Aceh 23111, Indonesia
Irsan Hardi Economic Modeling and Data Analytics Unit, Rumoh Riset Indonesia, Aceh Besar 23371, Indonesia
Edi Saputra Ringga Department of Economics, Faculty of Business, Economics and Social Development, Universiti Malaysia Terengganu, Terengganu 21030, Malaysia
Rinaldi Idroes School of Mathematics and Applied Sciences, Universitas Syiah Kuala, Banda Aceh 23111, Indonesia

DOI:

https://doi.org/10.60084/ijma.v1i1.78

Keywords:

Financial management, Imbalanced dataset, Tabular machine learning, SMOTE

Abstract

The rise of digital transactions and electronic payment systems in modern financial management has brought convenience but also the challenge of credit card fraud. Traditional fraud detection methods are struggling to cope with the complexities of contemporary fraud strategies. This study explores the potential of machine learning, specifically the XGBoost (eXtreme Gradient Boosting) algorithm, combined with data augmentation techniques, to enhance credit card fraud detection. The research demonstrates the effectiveness of these techniques in addressing imbalanced datasets and improving fraud detection accuracy. The study showcases a balanced approach to precision and recall in fraud detection by leveraging historical transaction data and employing techniques like Synthetic Minority Over-sampling Technique-Edited Nearest Neighbors (SMOTE-ENN). The implications of these findings for contemporary financial management are profound, offering the potential to bolster financial integrity, allocate resources effectively, and strengthen customer trust in the face of evolving fraud tactics.

Downloads

Download data is not yet available.

References

Barker, K. J., D’Amato, J., and Sheridon, P. (2008). Credit card fraud: awareness and prevention, Journal of Financial Crime, Vol. 15, No. 4, 398–410. doi:10.1108/13590790810907236.
Butaru, F., Chen, Q., Clark, B., Das, S., Lo, A. W., and Siddique, A. (2016). Risk and risk management in the credit card industry, Journal of Banking & Finance, Vol. 72, 218–239.
Almudaires, F., and Almaiah, M. (2021). Data an overview of cybersecurity threats on credit card companies and credit card risk mitigation, 2021 International Conference on Information Technology (ICIT), IEEE, 732–738.
Limbu, Y. B., Huhmann, B. A., and Xu, B. (2012). Are college students at greater risk of credit card abuse? Age, gender, materialism and parental influence on consumer response to credit cards, Journal of Financial Services Marketing, Vol. 17, 148–162.
Leonard, K. J. (1993). Detecting credit card fraud using expert systems, Computers & Industrial Engineering, Vol. 25, Nos. 1–4, 103–106.
Kou, Y., Lu, C.-T., Sirwongwattana, S., and Huang, Y.-P. (2004). Survey of fraud detection techniques, IEEE International Conference on Networking, Sensing and Control, 2004 (Vol. 2), IEEE, 749–754.
Bolton, R. J., and Hand, D. J. (2002). Statistical fraud detection: A review, Statistical Science, Vol. 17, No. 3, 235–255.
Asha, R. B., and KR, S. K. (2021). Credit card fraud detection using artificial neural network, Global Transitions Proceedings, Vol. 2, No. 1, 35–41.
Sailusha, R., Gnaneswar, V., Ramesh, R., and Rao, G. R. (2020). Credit card fraud detection using machine learning, 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS), IEEE, 1264–1270.
Carcillo, F., Le Borgne, Y.-A., Caelen, O., Kessaci, Y., Oblé, F., and Bontempi, G. (2021). Combining unsupervised and supervised learning in credit card fraud detection, Information Sciences, Vol. 557, 317–331.
Varmedja, D., Karanovic, M., Sladojevic, S., Arsenovic, M., and Anderla, A. (2019). Credit card fraud detection-machine learning methods, 2019 18th International Symposium INFOTEH-JAHORINA (INFOTEH), IEEE, 1–5.
Noviandy, T. R., Maulana, A., Idroes, G. M., Maulydia, N. B., Patwekar, M., Suhendra, R., and Idroes, R. (2023). Integrating Genetic Algorithm and LightGBM for QSAR Modeling of Acetylcholinesterase Inhibitors in Alzheimer’s Disease Drug Discovery, Malacca Pharmaceutics, Vol. 1, No. 2, 48–54. doi:10.60084/mp.v1i2.60.
Agustia, M., Noviandy, T. R., Maulana, A., Suhendra, R., Muslem, M., Sasmita, N. R., Idroes, G. M., Rahimah, S., Afidh, R. P. F., Subianto, M., Irvanizam, I., and Idroes, R. (2022). Application of Fuzzy Support Vector Regression to Predict the Kovats Retention Indices of Flavors and Fragrances, 2022 International Conference on Electrical Engineering and Informatics (ICELTICs), IEEE, 13–18. doi:10.1109/ICELTICs56128.2022.9932124.
Noviandy, T. R., Maulana, A., Emran, T. B., Idroes, G. M., and Idroes, R. (2023). QSAR Classification of Beta-Secretase 1 Inhibitor Activity in Alzheimer’s Disease Using Ensemble Machine Learning Algorithms, Heca Journal of Applied Sciences, Vol. 1, No. 1, 1–7. doi:10.60084/hjas.v1i1.12.
Maulana, A., Noviandy, T. R., Idroes, R., Sasmita, N. R., Suhendra, R., and Irvanizam, I. (2020). Prediction of Kovats Retention Indices for Fragrance and Flavor using Artificial Neural Network, Proceedings of the International Conference on Electrical Engineering and Informatics (Vol. 2020-Octob). doi:10.1109/ICELTICs50595.2020.9315391.
Idroes, R., Noviandy, T. R., Maulana, A., Suhendra, R., Sasmita, N. R., Muslem, M., Idroes, G. M., Kemala, P., and Irvanizam, I. (2021). Application of Genetic Algorithm-Multiple Linear Regression and Artificial Neural Network Determinations for Prediction of Kovats Retention Index, International Review on Modelling and Simulations (IREMOS), Vol. 14, No. 2, 137. doi:10.15866/iremos.v14i2.20460.
Maulana, A., Faisal, F. R., Noviandy, T. R., Rizkia, T., Idroes, G. M., Tallei, T. E., El-Shazly, M., and Idroes, R. (2023). Machine Learning Approach for Diabetes Detection Using Fine-Tuned XGBoost Algorithm, Infolitika Journal of Data Science, Vol. 1, No. 1, 1–7. doi:10.60084/ijds.v1i1.72.
Noviandy, T. R., Maulana, A., Idroes, G. M., Suhendra, R., Adam, M., Rusyana, A., and Sofyan, H. (2023). Deep Learning-Based Bitcoin Price Forecasting Using Neural Prophet, Ekonomikalia Journal of Economics, Vol. 1, No. 1, 19–25. doi:10.60084/eje.v1i1.51.
Chen, T., and Guestrin, C. (2016). Xgboost: A scalable tree boosting system, Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, 785–794.
Rufo, D. D., Debelee, T. G., Ibenthal, A., and Negera, W. G. (2021). Diagnosis of Diabetes Mellitus Using Gradient Boosting Machine (LightGBM), Diagnostics, Vol. 11, No. 9, 1714. doi:10.3390/diagnostics11091714.
Maulana, A., Noviandy, T. R., Sasmita, N. R., Paristiowati, M., Suhendra, R., Yandri, E., Satrio, J., and Idroes, R. (2023). Optimizing University Admissions: A Machine Learning Perspective, Journal of Educational Management and Learning, Vol. 1, No. 1, 1–7. doi:10.60084/jeml.v1i1.46.
Dong, X., Yu, Z., Cao, W., Shi, Y., and Ma, Q. (2020). A survey on ensemble learning, Frontiers of Computer Science, Vol. 14, No. 2, 241–258. doi:10.1007/s11704-019-8208-z.
Al Daoud, E. (2019). Comparison between XGBoost, LightGBM and CatBoost using a home credit dataset, International Journal of Computer and Information Engineering, Vol. 13, No. 1, 6–10.
Li, H., Cao, Y., Li, S., Zhao, J., and Sun, Y. (2020). XGBoost model and its application to personal credit evaluation, IEEE Intelligent Systems, Vol. 35, No. 3, 52–61.
Kotsiantis, S., Kanellopoulos, D., and Pintelas, P. (2006). Handling imbalanced datasets: A review, GESTS International Transactions on Computer Science and Engineering, Vol. 30, No. 1, 25–36.
Chawla, N. V. (2010). Data mining for imbalanced datasets: An overview, Data Mining and Knowledge Discovery Handbook, 875–886.
Maharana, K., Mondal, S., and Nemade, B. (2022). A review: Data pre-processing and data augmentation techniques, Global Transitions Proceedings, Vol. 3, No. 1, 91–99.
Mohammed, R., Rawashdeh, J., and Abdullah, M. (2020). Machine learning with oversampling and undersampling techniques: overview study and experimental results, 2020 11th International Conference on Information and Communication Systems (ICICS), IEEE, 243–248.
Yap, B. W., Rani, K. A., Rahman, H. A. A., Fong, S., Khairudin, Z., and Abdullah, N. N. (2014). An application of oversampling, undersampling, bagging and boosting in handling imbalanced datasets, Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013), Springer, 13–22.
Chawla, N. V, Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique, Journal of Artificial Intelligence Research, Vol. 16, 321–357.
Suhendra, R., Arnia, F., Idroes, R., Earlia, N., and Suhartono, E. (2019). A Novel Approach to Multi-class Atopic Dermatitis Disease Severity Scoring using Multi-class SVM, 2019 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom), IEEE, 35–39. doi:10.1109/CYBERNETICSCOM.2019.8875693.
Jonathan, B., Putra, P. H., and Ruldeviyani, Y. (2020). Observation imbalanced data text to predict users selling products on female daily with smote, tomek, and smote-tomek, 2020 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT), IEEE, 81–85.
Muntasir Nishat, M., Faisal, F., Jahan Ratul, I., Al-Monsur, A., Ar-Rafi, A. M., Nasrullah, S. M., Reza, M. T., and Khan, M. R. H. (2022). A comprehensive investigation of the performances of different machine learning classifiers with SMOTE-ENN oversampling technique and hyperparameter optimization for imbalanced heart failure dataset, Scientific Programming, Vol. 2022, 1–17.
He, H., Bai, Y., Garcia, E. A., and Li, S. (2008). ADASYN: Adaptive synthetic sampling approach for imbalanced learning, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Ieee, 1322–1328.
Tharwat, A. (2020). Classification assessment methods, Applied Computing and Informatics, Vol. 17, No. 1, 168–192.