An Interpretable Machine Learning Framework for Predicting Advanced Tumor Stages

Teuku Rizky Noviandy; Mohsina Patwekar; Faheem Patwekar; Rinaldi Idroes

doi:10.60084/ijds.v3i2.364

Authors

Teuku Rizky Noviandy Department of Information Systems, Faculty of Engineering, Universitas Abulyatama, Aceh Besar 23372, Indonesia
Mohsina Patwekar Department of Pharmacology, Luqman College of Pharmacy, Karnataka 585102, India
Faheem Patwekar Department of Pharmacognosy, Luqman College of Pharmacy, Karnataka 585102, India
Rinaldi Idroes School of Mathematics and Applied Sciences, Universitas Syiah Kuala, Banda Aceh 23111, Indonesia

DOI:

https://doi.org/10.60084/ijds.v3i2.364

Keywords:

Tumor stage prediction, Machine learning, Clinical prediction, Medical machine learning

Abstract

Accurate identification of advanced tumor stages is essential for timely clinical decision-making and personalized treatment planning. This study proposes an explainable ensemble learning framework for predicting advanced tumor stage using a dataset containing 10,000 samples with 18 clinical and radiological features. Four machine learning models, namely Logistic Regression, Naïve Bayes, AdaBoost, and LightGBM, were evaluated using stratified train–test splits along with standard performance metrics. LightGBM achieved the highest performance, with an accuracy of 86.05% and an F1-score of 76.61%, outperforming linear and probabilistic classifiers. ROC–AUC and precision–recall analyses further confirmed the superior discriminative ability of ensemble methods. SHAP explainability techniques highlighted mitotic count, Ki-67 index, enhancement, and necrosis as the most influential predictors of advanced stage. The proposed framework demonstrates strong predictive capability and provides clinically interpretable insights, underscoring its potential as a decision-support tool in oncological diagnostics. Future work will involve external validation and integration of additional multimodal data to enhance generalizability.

Downloads

Download data is not yet available.

References

Crosby, D., Bhatia, S., Brindle, K. M., Coussens, L. M., Dive, C., Emberton, M., Esener, S., Fitzgerald, R. C., Gambhir, S. S., Kuhn, P., Rebbeck, T. R., and Balasubramanian, S. (2022). Early Detection of Cancer, Science, Vol. 375, No. 6586. doi:10.1126/science.aay9040.
Patwekar, F., Patwekar, M., and Kamal, M. A. (2025). Synergizing Phytonanotherapy and Complementary Medicine: Future Horizons in Cancer and Diabetes Care, Global Translational Medicine, Vol. 4, No. 1, 16. doi:10.36922/gtm.5840.
Liu, B., Zhou, H., Tan, L., Siu, K. T. H., and Guan, X.-Y. (2024). Exploring Treatment Options in Cancer: Tumor Treatment Strategies, Signal Transduction and Targeted Therapy, Vol. 9, No. 1, 175. doi:10.1038/s41392-024-01856-7.
Noviandy, T. R., Alfanshury, M. H., Abidin, T. F., and Riza, H. (2023). Enhancing Glioma Grading Performance: A Comparative Study on Feature Selection Techniques and Ensemble Machine Learning, 2023 International Conference on Computer, Control, Informatics and Its Applications (IC3INA), IEEE, 406–411. doi:10.1109/IC3INA60834.2023.10285778.
Telloni, S. M. (2017). Tumor Staging and Grading: A Primer, 1–17. doi:10.1007/978-1-4939-6990-6_1.
Zhang, S., Xiao, X., Yi, Y., Wang, X., Zhu, L., Shen, Y., Lin, D., and Wu, C. (2024). Tumor Initiation and Early Tumorigenesis: Molecular Mechanisms and Interventional Targets, Signal Transduction and Targeted Therapy, Vol. 9, No. 1, 149. doi:10.1038/s41392-024-01848-7.
Patwekar, M., and Patwekar, F. (2025). Current Systems Biology Methods Used in Immunotoxicogenomics, Immunotoxicogenomics, Elsevier, 37–66. doi:10.1016/B978-0-443-18502-1.00011-0.
Zhou, R., Tang, X., and Wang, Y. (2024). Emerging Strategies to Investigate the Biology of Early Cancer, Nature Reviews Cancer, Vol. 24, No. 12, 850–866. doi:10.1038/s41568-024-00754-y.
Upadhyay, A. (2021). Cancer: An Unknown Territory; Rethinking before Going Ahead, Genes & Diseases, Vol. 8, No. 5, 655–661. doi:10.1016/j.gendis.2020.09.002.
Patwekar, M., and Patwekar, F. (2025). Lymphocyte Immunotherapy and Clinical Outcome in Recurrent Pregnancy Loss Patients, Reproductive Immunogenetics: A Molecular and Clinical Overview, Elsevier, 215–238. doi:10.1016/B978-0-443-13657-3.00015-5.
Sakkal, M., and Hajal, A. A. (2025). Machine Learning Predictions of Tumor Progression: How Reliable Are We?, Computers in Biology and Medicine, Vol. 191, 110156. doi:10.1016/j.compbiomed.2025.110156.
Al-Ewaidat, O. A., and Naffaa, M. M. (2025). Emerging AI- and Biomarker-Driven Precision Medicine in Autoimmune Rheumatic Diseases: From Diagnostics to Therapeutic Decision-Making, Rheumato, Vol. 5, No. 4, 17. doi:10.3390/rheumato5040017.
Kashyap, A., Rapsomaniki, M. A., Barros, V., Fomitcheva-Khartchenko, A., Martinelli, A. L., Rodriguez, A. F., Gabrani, M., Rosen-Zvi, M., and Kaigala, G. (2022). Quantification of Tumor Heterogeneity: From Data Acquisition to Metric Generation, Trends in Biotechnology, Vol. 40, No. 6, 647–676. doi:10.1016/j.tibtech.2021.11.006.
Rahnenführer, J., De Bin, R., Benner, A., Ambrogi, F., Lusa, L., Boulesteix, A.-L., Migliavacca, E., Binder, H., Michiels, S., Sauerbrei, W., and McShane, L. (2023). Statistical Analysis of High-Dimensional Biomedical Data: A Gentle Introduction to Analytical Goals, Common Approaches and Challenges, BMC Medicine, Vol. 21, No. 1, 182. doi:10.1186/s12916-023-02858-y.
Patwekar, M., Sehar, N., Patwekar, F., Medikeri, A., Ali, S., Aldossri, R. M., and Rehman, M. U. (2024). Novel Immune Checkpoint Targets: A Promising Therapy for Cancer Treatments, International Immunopharmacology, Vol. 126, 111186. doi:10.1016/j.intimp.2023.111186.
Rane, N. L., Paramesha, M., Choudhary, S. P., and Rane, J. (2024). Machine Learning and Deep Learning for Big Data Analytics: A Review of Methods and Applications, Partners Universal International Innovation Journal, Vol. 2, No. 3, 172–197. doi:10.5281/zenodo.12271006.
Noviandy, T. R., Maulana, A., Idroes, G. M., Emran, T. B., Tallei, T. E., Helwani, Z., and Idroes, R. (2023). Ensemble Machine Learning Approach for Quantitative Structure Activity Relationship Based Drug Discovery: A Review, Infolitika Journal of Data Science, Vol. 1, No. 1, 32–41. doi:10.60084/ijds.v1i1.91.
Sharma, A., Lysenko, A., Jia, S., Boroevich, K. A., and Tsunoda, T. (2024). Advances in AI and Machine Learning for Predictive Medicine, Journal of Human Genetics, Vol. 69, No. 10, 487–497. doi:10.1038/s10038-024-01231-y.
Noviandy, T. R., Idroes, G. M., and Hardi, I. (2025). Integrating Explainable Artificial Intelligence and Light Gradient Boosting Machine for Glioma Grading, Informatics and Health, Vol. 2, No. 1, 1–8. doi:10.1016/j.infoh.2024.12.001.
Noviandy, T. R., Idroes, G. M., and Hardi, I. (2025). An Interpretable Bayesian-Optimized XGBoost Framework for Neuropsychiatric Drug Candidate Classification, Iran Journal of Computer Science. doi:10.1007/s42044-025-00297-x.
Dissanayake, G. D. (2025). Brain Tumor Stages Dataset.
Garan, M., and Tidriri, K. (2022). A Data-Centric Machine Learning Methodology : Application, 1–21.
Amato, A., and Di Lecce, V. (2023). Data Preprocessing Impact on Machine Learning Algorithm Performance, Open Computer Science, Vol. 13, No. 1. doi:10.1515/comp-2022-0278.
Noviandy, T. R., Maulana, A., Idroes, G. M., Suhendra, R., Afidh, R. P. F., and Idroes, R. (2024). An Explainable Multi-Model Stacked Classifier Approach for Predicting Hepatitis C Drug Candidates, Sci, Vol. 6, No. 4, 81. doi:10.3390/sci6040081.
Zhang, T., Moro, S., and Ramos, R. F. (2022). A Data-Driven Approach to Improve Customer Churn Prediction Based on Telecom Customer Segmentation, Future Internet, Vol. 14, No. 3, 94. doi:10.3390/fi14030094.
Bai, Q., Su, C., Tang, W., and Li, Y. (2022). Machine Learning to Predict End Stage Kidney Disease in Chronic Kidney Disease, Scientific Reports, Vol. 12, No. 1, 8377. doi:10.1038/s41598-022-12316-z.
Ramzan, M., Sheng, J., Saeed, M. U., Wang, B., and Duraihem, F. Z. (2024). Revolutionizing Anemia Detection: Integrative Machine Learning Models and Advanced Attention Mechanisms, Visual Computing for Industry, Biomedicine, and Art, Vol. 7, No. 1, 18. doi:10.1186/s42492-024-00169-4.
Rahman, S., Irfan, M., Raza, M., Moyeezullah Ghori, K., Yaqoob, S., and Awais, M. (2020). Performance Analysis of Boosting Classifiers in Recognizing Activities of Daily Living, International Journal of Environmental Research and Public Health, Vol. 17, No. 3, 1082. doi:10.3390/ijerph17031082.
Noviandy, T. R., Idroes, G. M., Hardi, I., Afjal, M., and Ray, S. (2024). A Model-Agnostic Interpretability Approach to Predicting Customer Churn in the Telecommunications Industry, Infolitika Journal of Data Science, Vol. 2, No. 1, 34–44. doi:10.60084/ijds.v2i1.199.
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.-Y. (2017). Lightgbm: A Highly Efficient Gradient Boosting Decision Tree, Advances in Neural Information Processing Systems, Vol. 30.
Noviandy, T. R., Nainggolan, S. I., Raihan, R., Firmansyah, I., and Idroes, R. (2023). Maternal Health Risk Detection Using Light Gradient Boosting Machine Approach, Infolitika Journal of Data Science, Vol. 1, No. 2, 48–55. doi:10.60084/ijds.v1i2.123.
Tharwat, A. (2021). Classification Assessment Methods, Applied Computing and Informatics, Vol. 17, No. 1, 168–192. doi:10.1016/j.aci.2018.08.003.
Lundberg, S. M., and Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions, Advances in Neural Information Processing Systems, Vol. 30.