Interpretable Machine Learning QSAR Models for Classification and Screening of VEGFR-2 Inhibitors in Anticancer Drug Discovery

Teuku Rizky Noviandy; Rinaldi Idroes

doi:10.60084/mp.v3i2.339

Authors

Teuku Rizky Noviandy Department of Information Systems, Faculty of Engineering, Universitas Abulyatama, Aceh Besar 23372, Indonesia
Rinaldi Idroes School of Mathematics and Applied Sciences, Universitas Syiah Kuala, Banda Aceh 23111, Indonesia

DOI:

https://doi.org/10.60084/mp.v3i2.339

Keywords:

VEGFR-2 inhibitors, Machine learning, QSAR modeling, Drug discovery

Abstract

Cancer remains a major global health burden, with angiogenesis playing a central role in tumor growth and progression. Vascular Endothelial Growth Factor Receptor-2 (VEGFR-2) is a key mediator of angiogenesis and an attractive therapeutic target, but existing inhibitors are limited by reduced efficacy, toxicity, and resistance, creating a need for more effective predictive models in drug discovery. In this study, an interpretable machine learning based QSAR approach was developed using a curated dataset of 10,221 VEGFR-2 inhibitors from ChEMBL represented by 164 molecular descriptors. Four algorithms, kNN, AdaBoost, Random Forest, and XGBoost, were compared, and XGBoost achieved the best results with an accuracy of 83.67 percent, sensitivity of 91.38 percent, specificity of 71.73 percent, F1-score of 87.17 percent, and AUC of 0.9009. Model interpretation with LIME identified molecular descriptors related to hydrogen bonding, electrostatics, and lipophilicity as key contributors to activity. These results indicate that interpretable ensemble models can combine strong predictive performance with mechanistic insights, supporting rational design and optimization of novel VEGFR-2 inhibitors for anticancer therapy.

Downloads

Download data is not yet available.

References

Lin, H.-Y., and Park, J. Y. (2023). Epidemiology of Cancer, Anesthesia for Oncological Surgery, Springer International Publishing, Cham, 11–16. doi:10.1007/978-3-031-50977-3_2. DOI: https://doi.org/10.1007/978-3-031-50977-3_2
Liu, B., Zhou, H., Tan, L., Siu, K. T. H., and Guan, X.-Y. (2024). Exploring Treatment Options in Cancer: Tumor Treatment Strategies, Signal Transduction and Targeted Therapy, Vol. 9, No. 1, 175. doi:10.1038/s41392-024-01856-7. DOI: https://doi.org/10.1038/s41392-024-01856-7
Bertolaccini, L., Casiraghi, M., Uslenghi, C., Maiorca, S., and Spaggiari, L. (2024). Recent Advances in Lung Cancer Research: Unravelling the Future of Treatment, Updates in Surgery, Vol. 76, No. 6, 2129–2140. doi:10.1007/s13304-024-01841-3. DOI: https://doi.org/10.1007/s13304-024-01841-3
Fitzgerald, R. C., Antoniou, A. C., Fruk, L., and Rosenfeld, N. (2022). The Future of Early Cancer Detection, Nature Medicine, Vol. 28, No. 4, 666–677. doi:10.1038/s41591-022-01746-x. DOI: https://doi.org/10.1038/s41591-022-01746-x
Carugo, A., and Draetta, G. F. (2019). Academic Discovery of Anticancer Drugs: Historic and Future Perspectives, Annual Review of Cancer Biology, Vol. 3, No. 1, 385–408. doi:10.1146/annurev-cancerbio-030518-055645. DOI: https://doi.org/10.1146/annurev-cancerbio-030518-055645
Liu, Z.-L., Chen, H.-H., Zheng, L.-L., Sun, L.-P., and Shi, L. (2023). Angiogenic Signaling Pathways and Anti-Angiogenic Therapy for Cancer, Signal Transduction and Targeted Therapy, Vol. 8, No. 1, 198. doi:10.1038/s41392-023-01460-1. DOI: https://doi.org/10.1038/s41392-023-01460-1
Shah, A. A., Kamal, M. A., and Akhtar, S. (2021). Tumor Angiogenesis and VEGFR-2: Mechanism, Pathways and Current Biological Therapeutic Interventions, Current Drug Metabolism, Vol. 22, No. 1, 50–59. doi:10.2174/1389200221666201019143252. DOI: https://doi.org/10.2174/1389200221666201019143252
Dhudum, R., Ganeshpurkar, A., and Pawar, A. (2024). Revolutionizing Drug Discovery: A Comprehensive Review of AI Applications, Drugs and Drug Candidates, Vol. 3, No. 1, 148–171. doi:10.3390/ddc3010009. DOI: https://doi.org/10.3390/ddc3010009
Choudhuri, S., Yendluri, M., Poddar, S., Li, A., Mallick, K., Mallik, S., and Ghosh, B. (2023). Recent Advancements in Computational Drug Design Algorithms through Machine Learning and Optimization, Kinases and Phosphatases, Vol. 1, No. 2, 117–140. doi:10.3390/kinasesphosphatases1020008. DOI: https://doi.org/10.3390/kinasesphosphatases1020008
Noviandy, T. R., Nisa, K., Idroes, G. M., Hardi, I., and Sasmita, N. R. (2024). Classifying Beta-Secretase 1 Inhibitor Activity for Alzheimer’s Drug Discovery with LightGBM, Journal of Computing Theories and Applications, Vol. 2, No. 2, 138–147. doi:10.62411/jcta.10129. DOI: https://doi.org/10.62411/jcta.10129
Noviandy, T. R., Maulana, A., Idroes, G. M., Emran, T. B., Tallei, T. E., Helwani, Z., and Idroes, R. (2023). Ensemble Machine Learning Approach for Quantitative Structure Activity Relationship Based Drug Discovery: A Review, Infolitika Journal of Data Science, Vol. 1, No. 1, 32–41. doi:10.60084/ijds.v1i1.91. DOI: https://doi.org/10.60084/ijds.v1i1.91
Supriatna, D. J. I., Saputra, H., and Hasan, K. (2023). Enhancing the Red Wine Quality Classification Using Ensemble Voting Classifiers, Infolitika Journal of Data Science, Vol. 1, No. 2, 42–47. doi:10.60084/ijds.v1i2.95. DOI: https://doi.org/10.60084/ijds.v1i2.95
Noviandy, T. R., Maulana, A., Idroes, G. M., Irvanizam, I., Subianto, M., and Idroes, R. (2023). QSAR-Based Stacked Ensemble Classifier for Hepatitis C NS5B Inhibitor Prediction, 2023 2nd International Conference on Computer System, Information Technology, and Electrical Engineering (COSITE), IEEE, 220–225. doi:10.1109/COSITE60233.2023.10250039. DOI: https://doi.org/10.1109/COSITE60233.2023.10250039
Noviandy, T. R., Maulana, A., Idroes, G. M., Suhendra, R., Afidh, R. P. F., and Idroes, R. (2024). An Explainable Multi-Model Stacked Classifier Approach for Predicting Hepatitis C Drug Candidates, Sci, Vol. 6, No. 4, 81. doi:10.3390/sci6040081. DOI: https://doi.org/10.3390/sci6040081
Dwivedi, R., Dave, D., Naik, H., Singhal, S., Omer, R., Patel, P., Qian, B., Wen, Z., Shah, T., Morgan, G., and Ranjan, R. (2023). Explainable AI (XAI): Core Ideas, Techniques, and Solutions, ACM Computing Surveys, Vol. 55, No. 9, 1–33. doi:10.1145/3561048. DOI: https://doi.org/10.1145/3561048
Noviandy, T. R., Maulana, A., Irvanizam, I., Idroes, G. M., Maulydia, N. B., Tallei, T. E., Subianto, M., and Idroes, R. (2025). Interpretable Machine Learning Approach to Predict Hepatitis C Virus NS5B Inhibitor Activity Using Voting-Based LightGBM and SHAP, Intelligent Systems with Applications, Vol. 25, 200481. doi:10.1016/j.iswa.2025.200481. DOI: https://doi.org/10.1016/j.iswa.2025.200481
Gaulton, A., Bellis, L. J., Bento, A. P., Chambers, J., Davies, M., Hersey, A., Light, Y., McGlinchey, S., Michalovich, D., Al-Lazikani, B., and Overington, J. P. (2012). ChEMBL: A Large-Scale Bioactivity Database for Drug Discovery, Nucleic Acids Research, Vol. 40, No. D1, D1100–D1107. doi:10.1093/nar/gkr777. DOI: https://doi.org/10.1093/nar/gkr777
Noviandy, T. R., Idroes, G. M., and Hardi, I. (2024). Machine Learning Approach to Predict AXL Kinase Inhibitor Activity for Cancer Drug Discovery Using XGBoost and Bayesian Optimization, Journal of Soft Computing and Data Mining, Vol. 5, No. 1, 46–56. DOI: https://doi.org/10.30880/jscdm.2024.05.01.004
Grisoni, F., Consonni, V., and Todeschini, R. (2018). Impact of Molecular Descriptors on Computational Models, 171–209. doi:10.1007/978-1-4939-8639-2_5. DOI: https://doi.org/10.1007/978-1-4939-8639-2_5
Moriwaki, H., Tian, Y. S., Kawashita, N., and Takagi, T. (2018). Mordred: A Molecular Descriptor Calculator, Journal of Cheminformatics, Vol. 10, No. 1, 1–14. doi:10.1186/s13321-018-0258-y. DOI: https://doi.org/10.1186/s13321-018-0258-y
Tropsha, A. (2010). Best Practices for QSAR Model Development, Validation, and Exploitation, Molecular Informatics, Vol. 29, Nos. 6–7, 476–488. doi:10.1002/minf.201000061. DOI: https://doi.org/10.1002/minf.201000061
Joseph, V. R. (2022). Optimal Ratio for Data Splitting, Statistical Analysis and Data Mining: The ASA Data Science Journal, Vol. 15, No. 4, 531–538. doi:10.1002/sam.11583. DOI: https://doi.org/10.1002/sam.11583
Suhendra, R., Husdayanti, N., Suryadi, S., Juliwardi, I., Sanusi, S., Ridho, A., Ardiansyah, M., Murhaban, M., and Ikhsan, I. (2023). Cardiovascular Disease Prediction Using Gradient Boosting Classifier, Infolitika Journal of Data Science, Vol. 1, No. 2, 56–62. doi:10.60084/ijds.v1i2.131. DOI: https://doi.org/10.60084/ijds.v1i2.131
Singh, R., Ganeshpurkar, A., Ghosh, P., Pokle, A. V., Kumar, D., Singh, R. bhushan, Singh, S. K., and Kumar, A. (2021). Classification of Beta‐site Amyloid Precursor Protein Cleaving Enzyme 1 Inhibitors by Using Machine Learning Methods, Chemical Biology & Drug Design, Vol. 98, No. 6, 1079–1097. doi:10.1111/cbdd.13965. DOI: https://doi.org/10.1111/cbdd.13965
Noviandy, T. R., Idroes, G. M., Hardi, I., Afjal, M., and Ray, S. (2024). A Model-Agnostic Interpretability Approach to Predicting Customer Churn in the Telecommunications Industry, Infolitika Journal of Data Science, Vol. 2, No. 1, 34–44. doi:10.60084/ijds.v2i1.199. DOI: https://doi.org/10.60084/ijds.v2i1.199
Maulana, A., Afidh, R. P. F., Maulydia, N. B., Idroes, G. M., and Rahimah, S. (2024). Predicting Obesity Levels with High Accuracy: Insights from a CatBoost Machine Learning Model, Infolitika Journal of Data Science, Vol. 2, No. 1, 17–27. doi:10.60084/ijds.v2i1.195. DOI: https://doi.org/10.60084/ijds.v2i1.195
Hidayat, T., Hadinata, E., Damanik, I. S., Vikki, Z., and Irvanizam, I. (2023). Implementation of Hybrid CNN-XGBoost Method for Leukemia Detection Problem, Infolitika Journal of Data Science, Vol. 1, No. 1, 15–21. doi:10.60084/ijds.v1i1.87. DOI: https://doi.org/10.60084/ijds.v1i1.87
Ahsan, M. M., Luna, S. A., and Siddique, Z. (2022). Machine-Learning-Based Disease Diagnosis: A Comprehensive Review, Healthcare, Vol. 10, No. 3, 541. doi:10.3390/healthcare10030541. DOI: https://doi.org/10.3390/healthcare10030541
Noviandy, T. R., Idroes, G. M., Syukri, M., and Idroes, R. (2024). Interpretable Machine Learning for Chronic Kidney Disease Diagnosis: A Gaussian Processes Approach, Indonesian Journal of Case Reports, Vol. 2, No. 1, 24–32. doi:10.60084/ijcr.v2i1.204. DOI: https://doi.org/10.60084/ijcr.v2i1.204
Tharwat, A. (2021). Classification Assessment Methods, Applied Computing and Informatics, Vol. 17, No. 1, 168–192. doi:10.1016/j.aci.2018.08.003. DOI: https://doi.org/10.1016/j.aci.2018.08.003