Assessing LightGBM Performance in Automated Leukemia Cell Classification
DOI:
https://doi.org/10.60084/ijds.v4i1.351Keywords:
LightGBM, Leukemia, Image classification, Machine learning, Blood cell subtypesAbstract
Leukemia is a type of blood cancer that requires fast and accurate diagnosis for effective treatment. Manual identification of leukemia blood cell subtypes is often challenging, time-consuming, and prone to observer variability, making automated image-based classification essential. This study evaluates the performance of the Light Gradient-Boosting Machine (LightGBM) as a computationally efficient and interpretable alternative to deep learning models for classifying leukemia subtypes. The dataset includes 3,000 microscopic images representing five classes: acute lymphocytic, acute myelogenous, chronic lymphocytic, chronic myelogenous, and healthy blood cells. Images were preprocessed using bilinear interpolation to balance quality and efficiency, and 90 statistical features were extracted across 13 distinct color spaces. The model was trained on an 80% subset and validated on a 20% hold-out set after hyperparameter optimization. LightGBM achieved robust performance with an accuracy of 93.3%, precision of 99.1%, recall of 94.9%, and an F-measure of 96.8%. Feature importance analysis revealed that texture variance in the YIQ color space (STD_YIQ_I) was the most critical predictor, highlighting the biological relevance of chromatin texture in classification. These results indicate that LightGBM is an effective, lightweight, and reliable approach for leukemia subtype classification, holding strong potential for implementation in resource-constrained automated diagnostic systems.
Downloads
References
- Mahdi, G. J. M. (2020). A Modified Support Vector Machine Classifiers Using Stochastic Gradient Descent with Application to Leukemia Cancer Type Dataset, Baghdad Science Journal, Vol. 17, No. 4, 1255–1266. doi:10.21123/bsj.2020.17.4.1255.
- Sung, H., Ferlay, J., Siegel, R. L., Laversanne, M., Soerjomataram, I., Jemal, A., and Bray, F. (2021). Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries, CA: A Cancer Journal for Clinicians, Vol. 71, No. 3, 209–249. doi:10.3322/caac.21660.
- Shah, A., Naqvi, S. S., Naveed, K., Salem, N., Khan, M. A. U., and Alimgeer, K. S. (2021). Automated Diagnosis of Leukemia: A Comprehensive Review, IEEE Access, IEEE, 132097–132124. doi:10.1109/ACCESS.2021.3114059.
- Quinn, L., Tryposkiadis, K., Deeks, J., De Vet, H. C. W., Mallett, S., Mokkink, L. B., Takwoingi, Y., Taylor-Phillips, S., and Sitch, A. (2023). Interobserver Variability Studies in Diagnostic Imaging: A Methodological Systematic Review, British Journal of Radiology, The British Institute of Radiology., 20220972. doi:10.1259/bjr.20220972.
- Khalifa, M., Albadawy, M., and Iqbal, U. (2024). Advancing Clinical Decision Support: The Role of Artificial Intelligence across Six Domains, Computer Methods and Programs in Biomedicine Update, Elsevier, 100142. doi:10.1016/j.cmpbup.2024.100142.
- Laosai, J., and Chamnongthai, K. (2018). Classification of Acute Leukemia Using Medical-Knowledge-Based Morphology and CD Marker, Biomedical Signal Processing and Control, Vol. 44, 127–137. doi:10.1016/j.bspc.2018.01.020.
- Ramya, B., and Rani, U. (2020). Acute Lymphoblastic Leukemia Detection and Classification of Its Subtypes Using Pretrained Deep Convolutional Neural Networks, International Journal of Scientific and Technology Research, Vol. 9, No. 2, 6177–6180.
- Zolfaghari, M., and Sajedi, H. (2022). A Survey on Automated Detection and Classification of Acute Leukemia and WBCs in Microscopic Blood Cells, Multimedia Tools and Applications, Vol. 81, No. 5, 6723–6753. doi:10.1007/s11042-022-12108-7.
- Vineela, M., Reddy, G. D. S., Karthik, G., Muthukumaran, N., and Al Deen, S. H. (2024). Classification of Leukemia White Blood Cell Cancer, 7th International Conference on Inventive Computation Technologies, ICICT 2024, IEEE, 1233–1236. doi:10.1109/ICICT60155.2024.10544978.
- Wani, A. A. (2025). Comprehensive Review of Dimensionality Reduction Algorithms: Challenges, Limitations, and Innovative Solutions, PeerJ Computer Science, Vol. 11, e3025. doi:10.7717/peerj-cs.3025.
- Abdou, M. A. (2022). Literature Review: Efficient Deep Neural Networks Techniques for Medical Image Analysis, Neural Computing and Applications, Springer, 5791–5812. doi:10.1007/s00521-022-06960-9.
- Rane, N., Choudhary, S. P., and Rane, J. (2024). Ensemble Deep Learning and Machine Learning: Applications, Opportunities, Challenges, and Future Directions, Studies in Medical and Health Sciences, Vol. 1, No. 2, 18–41. doi:10.48185/smhs.v1i2.1225.
- Hajihosseinlou, M., Maghsoudi, A., and Ghezelbash, R. (2023). A Novel Scheme for Mapping of MVT-Type Pb–Zn Prospectivity: LightGBM, a Highly Efficient Gradient Boosting Decision Tree Machine Learning Algorithm, Natural Resources Research, Vol. 32, No. 6, 2417–2438. doi:10.1007/s11053-023-10249-6.
- Goswami, B., Bhuyan, M. K., Alfarhood, S., and Safran, M. (2024). Classification of Oral Cancer Into Pre-Cancerous Stages From White Light Images Using LightGBM Algorithm, IEEE Access, Vol. 12, 31626–31639. doi:10.1109/ACCESS.2024.3370157.
- Sai, M. J., Chettri, P., Panigrahi, R., Garg, A., Bhoi, A. K., and Barsocchi, P. (2023). An Ensemble of Light Gradient Boosting Machine and Adaptive Boosting for Prediction of Type-2 Diabetes, International Journal of Computational Intelligence Systems, Vol. 16, No. 1, 14. doi:10.1007/s44196-023-00184-y.
- Kanber, B. M., Smadi, A. Al, Noaman, N. F., Liu, B., Gou, S., and Alsmadi, M. K. (2024). LightGBM: A Leading Force in Breast Cancer Diagnosis Through Machine Learning and Image Processing, IEEE Access, Vol. 12, 39811–39832. doi:10.1109/ACCESS.2024.3375755.
- Qiuqian, W., GaoMin, KeZhu, Z., and Chenchen. (2025). A Light Gradient Boosting Machine Learning-Based Approach for Predicting Clinical Data Breast Cancer, Multiscale and Multidisciplinary Modeling, Experiments and Design, Vol. 8, No. 1, 75. doi:10.1007/s41939-024-00662-6.
- Ramalingam, K., Yadalam, P. K., Ramani, P., Krishna, M., Hafedh, S., Badnjević, A., Cervino, G., and Minervini, G. (2024). Light Gradient Boosting-Based Prediction of Quality of Life among Oral Cancer-Treated Patients, BMC Oral Health, Vol. 24, No. 1, 349. doi:10.1186/s12903-024-04050-x.
- Noviandy, T. R., Maulana, A., Idroes, G. M., Maulydia, N. B., Patwekar, M., Suhendra, R., and Idroes, R. (2023). Integrating Genetic Algorithm and LightGBM for QSAR Modeling of Acetylcholinesterase Inhibitors in Alzheimer’s Disease Drug Discovery, Malacca Pharmaceutics, Vol. 1, No. 2, 48–54. doi:10.60084/mp.v1i2.60.
- Noviandy, T. R., Idroes, G. M., Maulana, A., Afidh, R. P. F., and Idroes, R. (2024). Optimizing Hepatitis C Virus Inhibitor Identification with LightGBM and Tree-Structured Parzen Estimator Sampling, Engineering, Technology & Applied Science Research, Vol. 14, No. 6, 18810–18817. doi:10.48084/etasr.8947.
- Noviandy, T. R., Idroes, G. M., Patwekar, M., and Idroes, R. (2025). Fine-Tuning ChemBERTa for Predicting Activity of AXL Kinase Inhibitors in Oncogenic Target Modeling, Grimsa Journal of Science Engineering and Technology, Vol. 3, No. 2, 73–84. doi:10.61975/gjset.v3i2.98.
- Suhendra, R., Husdayanti, N., Suryadi, S., Juliwardi, I., Sanusi, S., Ridho, A., Ardiansyah, M., Murhaban, M., and Ikhsan, I. (2023). Cardiovascular Disease Prediction Using Gradient Boosting Classifier, Infolitika Journal of Data Science, Vol. 1, No. 2, 56–62. doi:10.60084/ijds.v1i2.131.
- Maulana, A., Noviandy, T. R., Suhendra, R., Earlia, N., Prakoeswa, C. R. S., Kairupan, T. S., Idroes, G. M., Subianto, M., and Idroes, R. (2024). Psoriasis Severity Assessment: Optimizing Diagnostic Models with Deep Learning, Narra J, Vol. 4, No. 3, e1512. doi:10.52225/narra.v4i3.1512.
- Noviandy, T. R., and Idroes, R. (2025). Interpretable Machine Learning QSAR Models for Classification and Screening of VEGFR-2 Inhibitors in Anticancer Drug Discovery, Malacca Pharmaceutics, Vol. 3, No. 2, 58–66. doi:10.60084/mp.v3i2.339.
- Sasmita, N. R., Ramadeska, S., Kesuma, Z. M., Noviandy, T. R., Maulana, A., Khairul, M., and Suhendra, R. (2024). Decision Tree versus K-NN: A Performance Comparison for Air Quality Classification in Indonesia, Infolitika Journal of Data Science, Vol. 2, No. 1, 9–16. doi:10.60084/ijds.v2i1.179.
- Han, J., Kamber, M., and Pei, J. (2011). Data Mining Concepts and Techniques Third Edition, The Morgan Kaufmann Series in Data Management Systems, 83–124.
- Bagasjvara, R. G. (2017). Klasifikasi Jenis Sel Darah Putih Dan Sel Acute Lymphoblastic Leukemia Menggunakan Pengolahan Citra Digital Dengan Metode Multilayer Perceptron, Universitas Gadjah Mada.
- Wirawan, R. I. (2020). Klasifikasi Sel Darah Putih Limfoblas Dan Nonlimfoblas Pada Pasien Acute Lymphoblastic Leukemia Tipe L1 Dengan Menggunakan Metode Random Forest.
- Alamsyah, F., Dzikrullah Suratin, M., and Hamid, M. (2022). Analisis Perbandingan Metode Support Vector Machine (SVM) Dan Decision Tree Pada Klasifikasi Penyakit Acute Lymphoblastic Leukimia (ALL), Jurnal PRODUKTIF, Vol. 6, No. 2, 581–587.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Rara Syifa Qaisa, Hayatun Maghfirah, Suryadi Suryadi, Noviana Husdayanti, Rivansyah Suhendra

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.




















