Advanced Anemia Classification Using Comprehensive Hematological Profiles and Explainable Machine Learning Approaches

Authors

  • Teuku Rizky Noviandy Department of Information Systems, Faculty of Engineering, Universitas Abulyatama, Aceh Besar 23372, Indonesia
  • Ghifari Maulana Idroes Department of Nuclear Engineering and Engineering Physics, Universitas Gadjah Mada, Yogyakarta 55281, Indonesia
  • Rivansyah Suhendra Department of Information Technology, Faculty of Engineering, Universitas Teuku Umar, Aceh Barat 23681, Indonesia
  • Tedy Kurniawan Bakri Department of Pharmacy, Faculty of Mathematics and Natural Sciences, Universitas Syiah Kuala, Banda Aceh 23111, Indonesia
  • Rinaldi Idroes School of Mathematics and Applied Sciences, Universitas Syiah Kuala, Banda Aceh 23111, Indonesia

DOI:

https://doi.org/10.60084/ijds.v2i2.237

Keywords:

Hematological analysis, Data imbalance, Predictive algorithms, Clinical diagnostics, Health informatics

Abstract

Anemia is a common health issue with serious clinical effects, making timely and accurate diagnosis essential to prevent complications. This study explores the use of machine learning (ML) methods to classify anemia and its subtypes using detailed hematological data. Six ML models were tested: Gradient Boosting, Random Forest, Naive Bayes, Logistic Regression, Support Vector Machine, and K-Nearest Neighbors. The dataset was preprocessed using feature standardization and the Synthetic Minority Oversampling Technique (SMOTE) to address class imbalance. Gradient Boosting delivered the highest accuracy, sensitivity, and F1-score, establishing itself as the top-performing model. SHapley Additive exPlanations (SHAP) analysis was applied to enhance model interpretability, identifying key predictive features. This study highlights the potential of explainable ML to develop efficient, accurate, and scalable tools for anemia diagnosis, fostering improved healthcare outcomes globally.

Downloads

Download data is not yet available.

References

  1. Garcia‐Casal, M. N., Dary, O., Jefferds, M. E., and Pasricha, S. (2023). Diagnosing Anemia: Challenges Selecting Methods, Addressing Underlying Causes, and Implementing Actions at the Public Health Level, Annals of the New York Academy of Sciences, Vol. 1524, No. 1, 37–50. doi:10.1111/nyas.14996.
  2. Simon, G. I., Craswell, A., Thom, O., Chew, M. S., Anstey, C. M., and Fung, Y. L. (2019). Impacts of Aging on Anemia Tolerance, Transfusion Thresholds, and Patient Blood Management, Transfusion Medicine Reviews, Vol. 33, No. 3, 154–161. doi:10.1016/j.tmrv.2019.03.001.
  3. Shah, S. A., Soomro, U., Ali, O., Tariq, Y., Waleed, M. S., Guntipalli, P., and Younus, N. (2023). The Prevalence of Anemia in Working Women, Cureus. doi:10.7759/cureus.44104.
  4. He, W., Ruan, Y., Yuan, C., Luan, X., and He, J. (2020). Hemoglobin, Anemia, and Poststroke Cognitive Impairment: A Cohort Study, International Journal of Geriatric Psychiatry, Vol. 35, No. 5, 564–571. doi:10.1002/gps.5272.
  5. Wiciński, M., Liczner, G., Cadelski, K., Kołnierzak, T., Nowaczewska, M., and Malinowski, B. (2020). Anemia of Chronic Diseases: Wider Diagnostics—Better Treatment?, Nutrients, Vol. 12, No. 6, 1784. doi:10.3390/nu12061784.
  6. Samson, K. L. I., Fischer, J. A. J., and Roche, M. L. (2022). Iron Status, Anemia, and Iron Interventions and Their Associations with Cognitive and Academic Performance in Adolescents: A Systematic Review, Nutrients, Vol. 14, No. 1, 224. doi:10.3390/nu14010224.
  7. van Haalen, H., Jackson, J., Spinowitz, B., Milligan, G., and Moon, R. (2020). Impact of Chronic Kidney Disease and Anemia on Health-Related Quality of Life and Work Productivity: Analysis of Multinational Real-World Data, BMC Nephrology, Vol. 21, No. 1, 88. doi:10.1186/s12882-020-01746-4.
  8. Noviandy, T. R., Nainggolan, S. I., Raihan, R., Firmansyah, I., and Idroes, R. (2023). Maternal Health Risk Detection Using Light Gradient Boosting Machine Approach, Infolitika Journal of Data Science, Vol. 1, No. 2, 48–55. doi:10.60084/ijds.v1i2.123.
  9. Kabir, M. A., Rahman, M. M., and Khan, M. N. (2022). Maternal Anemia and Risk of Adverse Maternal Health and Birth Outcomes in Bangladesh: A Nationwide Population-Based Survey, PLOS ONE, Vol. 17, No. 12, e0277654. doi:10.1371/journal.pone.0277654.
  10. Hemoglobinometry, A., Red, C., Histogram, E., and Width, R. C. D. (2015). Principles and Practice of Clinical Hematology, Linne & Ringsrud’s Clinical Laboratory Science-E-Book: The Basics and Routine Techniques, Vol. 2, 291.
  11. Said, A. S., Spinella, P. C., Hartman, M. E., Steffen, K. M., Jackups, R., Holubkov, R., Wallendorf, M., and Doctor, A. (2017). RBC Distribution Width: Biomarker for Red Cell Dysfunction and Critical Illness Outcome?, Pediatric Critical Care Medicine, Vol. 18, No. 2, 134–142. doi:10.1097/PCC.0000000000001017.
  12. Solomon, D. D., Khan, S., Garg, S., Gupta, G., Almjally, A., Alabduallah, B. I., Alsagri, H. S., Ibrahim, M. M., and Abdallah, A. M. A. (2023). Hybrid Majority Voting: Prediction and Classification Model for Obesity, Diagnostics, Vol. 13, No. 15, 2610. doi:10.3390/diagnostics13152610.
  13. Suhendra, R., Suryadi, S., Husdayanti, N., Maulana, A., Noviandy, T. R., Sasmita, N. R., Subianto, M., Earlia, N., Niode, N. J., and Idroes, R. (2023). Evaluation of Gradient Boosted Classifier in Atopic Dermatitis Severity Score Classification, Heca Journal of Applied Sciences, Vol. 1, No. 2, 54–61. doi:10.60084/hjas.v1i2.85.
  14. Noviandy, T. R., Alfanshury, M. H., Abidin, T. F., and Riza, H. (2023). Enhancing Glioma Grading Performance: A Comparative Study on Feature Selection Techniques and Ensemble Machine Learning, 2023 International Conference on Computer, Control, Informatics and Its Applications (IC3INA), IEEE, 406–411. doi:10.1109/IC3INA60834.2023.10285778.
  15. Noviandy, T. R., Nisa, K., Idroes, G. M., Hardi, I., and Sasmita, N. R. (2024). Classifying Beta-Secretase 1 Inhibitor Activity for Alzheimer’s Drug Discovery with LightGBM, Journal of Computing Theories and Applications, Vol. 2, No. 2, 138–147. doi:10.62411/jcta.10129.
  16. Rufo, D. D., Debelee, T. G., Ibenthal, A., and Negera, W. G. (2021). Diagnosis of Diabetes Mellitus Using Gradient Boosting Machine (LightGBM), Diagnostics, Vol. 11, No. 9, 1714. doi:10.3390/diagnostics11091714.
  17. Noviandy, T. R., Maulana, A., Idroes, G. M., Maulydia, N. B., Patwekar, M., Suhendra, R., and Idroes, R. (2023). Integrating Genetic Algorithm and LightGBM for QSAR Modeling of Acetylcholinesterase Inhibitors in Alzheimer’s Disease Drug Discovery, Malacca Pharmaceutics, Vol. 1, No. 2, 48–54. doi:10.60084/mp.v1i2.60.
  18. Airlangga, G. (2024). Leveraging Machine Learning for Accurate Anemia Diagnosis Using Complete Blood Count Data, Indonesian Journal of Artificial Intelligence and Data Mining, Vol. 7, No. 2, 318. doi:10.24014/ijaidm.v7i2.29869.
  19. Ramzan, M., Sheng, J., Saeed, M. U., Wang, B., and Duraihem, F. Z. (2024). Revolutionizing Anemia Detection: Integrative Machine Learning Models and Advanced Attention Mechanisms, Visual Computing for Industry, Biomedicine, and Art, Vol. 7, No. 1, 18. doi:10.1186/s42492-024-00169-4.
  20. Antoniadi, A. M., Du, Y., Guendouz, Y., Wei, L., Mazo, C., Becker, B. A., and Mooney, C. (2021). Current Challenges and Future Opportunities for XAI in Machine Learning-Based Clinical Decision Support Systems: A Systematic Review, Applied Sciences, Vol. 11, No. 11, 5088. doi:10.3390/app11115088.
  21. Ali, S., Akhlaq, F., Imran, A. S., Kastrati, Z., Daudpota, S. M., and Moosa, M. (2023). The Enlightening Role of Explainable Artificial Intelligence in Medical & Healthcare Domains: A Systematic Literature Review, Computers in Biology and Medicine, Vol. 166, 107555. doi:10.1016/j.compbiomed.2023.107555.
  22. Lundberg, S. M., and Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions, Advances in Neural Information Processing Systems, Vol. 30.
  23. Nohara, Y., Matsumoto, K., Soejima, H., and Nakashima, N. (2022). Explanation of Machine Learning Models Using Shapley Additive Explanation and Application for Real Data in Hospital, Computer Methods and Programs in Biomedicine, Vol. 214, 106584. doi:10.1016/j.cmpb.2021.106584.
  24. Gramegna, A., and Giudici, P. (2021). SHAP and LIME: An Evaluation of Discriminative Power in Credit Risk, Frontiers in Artificial Intelligence, Vol. 4. doi:10.3389/frai.2021.752558.
  25. Vohra, R., Pahareeya, J., and Hussain, A. (2021). Complete Blood Count Anemia Diagnosis, Mendeley Data. doi:10.17632/dy9mfjchm7.1.
  26. Gunda, T., Hackett, S., Kraus, L., Downs, C., Jones, R., McNalley, C., Bolen, M., and Walker, A. (2020). A Machine Learning Evaluation of Maintenance Records for Common Failure Modes in PV Inverters, IEEE Access, Vol. 8, 211610–211620. doi:10.1109/ACCESS.2020.3039182.
  27. Noviandy, T. R., Idroes, G. M., Mohd Fauzi, F., and Idroes, R. (2024). Application of Ensemble Machine Learning Methods for QSAR Classification of Leukotriene A4 Hydrolase Inhibitors in Drug Discovery, Malacca Pharmaceutics, Vol. 2, No. 2, 68–78. doi:10.60084/mp.v2i2.217.
  28. Chawla, N. V, Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-Sampling Technique, Journal of Artificial Intelligence Research, Vol. 16, 321–357.
  29. Noviandy, T. R., Idroes, G. M., Maulana, A., Hardi, I., Ringga, E. S., and Idroes, R. (2023). Credit Card Fraud Detection for Contemporary Financial Management Using XGBoost-Driven Machine Learning and Data Augmentation Techniques, Indatu Journal of Management and Accounting, Vol. 1, No. 1, 29–35. doi:10.60084/ijma.v1i1.78.
  30. Berrar, D. (2019). Cross-Validation, Encyclopedia of Bioinformatics and Computational Biology, Elsevier, 542–545. doi:10.1016/B978-0-12-809633-8.20349-X.
  31. Noviandy, T. R., Zahriah, Z., Yandri, E., Jalil, Z., Yusuf, M., Yusof, N. I. S. M., Lala, A., and Idroes, R. (2024). Machine Learning for Early Detection of Dropout Risks and Academic Excellence: A Stacked Classifier Approach, Journal of Educational Management and Learning, Vol. 2, No. 1, 28–34. doi:10.60084/jeml.v2i1.191.
  32. Pratyusha, M., and Kanimozhi, K. V. (2022). Heart Disease Prediction Using Decision Tree in Comparison with K-Nearest Neighbor to Improve Accuracy, Advances in Parallel Computing, Vol. 0, No. 41, 231–236. doi:10.3233/APC220031.
  33. Idroes, G. M., Noviandy, T. R., Maulana, A., Zahriah, Z., Suhendrayatna, S., Suhartono, E., Khairan, K., Kusumo, F., Helwani, Z., and Abd Rahman, S. (2023). Urban Air Quality Classification Using Machine Learning Approach to Enhance Environmental Monitoring, Leuser Journal of Environmental Studies, Vol. 1, No. 2, 62–68. doi:10.60084/ljes.v1i2.99.
  34. Magazzino, C., Madaleno, M., Waqas, M., and Leogrande, A. (2024). Exploring the Determinants of Methane Emissions from a Worldwide Perspective Using Panel Data and Machine Learning Analyses, Environmental Pollution, Vol. 348, 123807. doi:10.1016/j.envpol.2024.123807.

Downloads

Published

2024-11-29

How to Cite

Noviandy, T. R., Idroes, G. M., Suhendra, R., Bakri, T. K., & Idroes, R. (2024). Advanced Anemia Classification Using Comprehensive Hematological Profiles and Explainable Machine Learning Approaches. Infolitika Journal of Data Science, 2(2), 72–81. https://doi.org/10.60084/ijds.v2i2.237