Interpretable Machine Learning QSAR Models for Classification and Screening of VEGFR-2 Inhibitors in Anticancer Drug Discovery
DOI:
https://doi.org/10.60084/mp.v3i2.339Keywords:
VEGFR-2 inhibitors, Machine learning, QSAR modeling, Drug discoveryAbstract
Cancer remains a major global health burden, with angiogenesis playing a central role in tumor growth and progression. Vascular Endothelial Growth Factor Receptor-2 (VEGFR-2) is a key mediator of angiogenesis and an attractive therapeutic target, but existing inhibitors are limited by reduced efficacy, toxicity, and resistance, creating a need for more effective predictive models in drug discovery. In this study, an interpretable machine learning based QSAR approach was developed using a curated dataset of 10,221 VEGFR-2 inhibitors from ChEMBL represented by 164 molecular descriptors. Four algorithms, kNN, AdaBoost, Random Forest, and XGBoost, were compared, and XGBoost achieved the best results with an accuracy of 83.67 percent, sensitivity of 91.38 percent, specificity of 71.73 percent, F1-score of 87.17 percent, and AUC of 0.9009. Model interpretation with LIME identified molecular descriptors related to hydrogen bonding, electrostatics, and lipophilicity as key contributors to activity. These results indicate that interpretable ensemble models can combine strong predictive performance with mechanistic insights, supporting rational design and optimization of novel VEGFR-2 inhibitors for anticancer therapy.
Downloads
References
- Lin, H.-Y., and Park, J. Y. (2023). Epidemiology of Cancer, Anesthesia for Oncological Surgery, Springer International Publishing, Cham, 11–16. doi:10.1007/978-3-031-50977-3_2. DOI: https://doi.org/10.1007/978-3-031-50977-3_2
- Liu, B., Zhou, H., Tan, L., Siu, K. T. H., and Guan, X.-Y. (2024). Exploring Treatment Options in Cancer: Tumor Treatment Strategies, Signal Transduction and Targeted Therapy, Vol. 9, No. 1, 175. doi:10.1038/s41392-024-01856-7. DOI: https://doi.org/10.1038/s41392-024-01856-7
- Bertolaccini, L., Casiraghi, M., Uslenghi, C., Maiorca, S., and Spaggiari, L. (2024). Recent Advances in Lung Cancer Research: Unravelling the Future of Treatment, Updates in Surgery, Vol. 76, No. 6, 2129–2140. doi:10.1007/s13304-024-01841-3. DOI: https://doi.org/10.1007/s13304-024-01841-3
- Fitzgerald, R. C., Antoniou, A. C., Fruk, L., and Rosenfeld, N. (2022). The Future of Early Cancer Detection, Nature Medicine, Vol. 28, No. 4, 666–677. doi:10.1038/s41591-022-01746-x. DOI: https://doi.org/10.1038/s41591-022-01746-x
- Carugo, A., and Draetta, G. F. (2019). Academic Discovery of Anticancer Drugs: Historic and Future Perspectives, Annual Review of Cancer Biology, Vol. 3, No. 1, 385–408. doi:10.1146/annurev-cancerbio-030518-055645. DOI: https://doi.org/10.1146/annurev-cancerbio-030518-055645
- Liu, Z.-L., Chen, H.-H., Zheng, L.-L., Sun, L.-P., and Shi, L. (2023). Angiogenic Signaling Pathways and Anti-Angiogenic Therapy for Cancer, Signal Transduction and Targeted Therapy, Vol. 8, No. 1, 198. doi:10.1038/s41392-023-01460-1. DOI: https://doi.org/10.1038/s41392-023-01460-1
- Shah, A. A., Kamal, M. A., and Akhtar, S. (2021). Tumor Angiogenesis and VEGFR-2: Mechanism, Pathways and Current Biological Therapeutic Interventions, Current Drug Metabolism, Vol. 22, No. 1, 50–59. doi:10.2174/1389200221666201019143252. DOI: https://doi.org/10.2174/1389200221666201019143252
- Dhudum, R., Ganeshpurkar, A., and Pawar, A. (2024). Revolutionizing Drug Discovery: A Comprehensive Review of AI Applications, Drugs and Drug Candidates, Vol. 3, No. 1, 148–171. doi:10.3390/ddc3010009. DOI: https://doi.org/10.3390/ddc3010009
- Choudhuri, S., Yendluri, M., Poddar, S., Li, A., Mallick, K., Mallik, S., and Ghosh, B. (2023). Recent Advancements in Computational Drug Design Algorithms through Machine Learning and Optimization, Kinases and Phosphatases, Vol. 1, No. 2, 117–140. doi:10.3390/kinasesphosphatases1020008. DOI: https://doi.org/10.3390/kinasesphosphatases1020008
- Noviandy, T. R., Nisa, K., Idroes, G. M., Hardi, I., and Sasmita, N. R. (2024). Classifying Beta-Secretase 1 Inhibitor Activity for Alzheimer’s Drug Discovery with LightGBM, Journal of Computing Theories and Applications, Vol. 2, No. 2, 138–147. doi:10.62411/jcta.10129. DOI: https://doi.org/10.62411/jcta.10129
- Noviandy, T. R., Maulana, A., Idroes, G. M., Emran, T. B., Tallei, T. E., Helwani, Z., and Idroes, R. (2023). Ensemble Machine Learning Approach for Quantitative Structure Activity Relationship Based Drug Discovery: A Review, Infolitika Journal of Data Science, Vol. 1, No. 1, 32–41. doi:10.60084/ijds.v1i1.91. DOI: https://doi.org/10.60084/ijds.v1i1.91
- Supriatna, D. J. I., Saputra, H., and Hasan, K. (2023). Enhancing the Red Wine Quality Classification Using Ensemble Voting Classifiers, Infolitika Journal of Data Science, Vol. 1, No. 2, 42–47. doi:10.60084/ijds.v1i2.95. DOI: https://doi.org/10.60084/ijds.v1i2.95
- Noviandy, T. R., Maulana, A., Idroes, G. M., Irvanizam, I., Subianto, M., and Idroes, R. (2023). QSAR-Based Stacked Ensemble Classifier for Hepatitis C NS5B Inhibitor Prediction, 2023 2nd International Conference on Computer System, Information Technology, and Electrical Engineering (COSITE), IEEE, 220–225. doi:10.1109/COSITE60233.2023.10250039. DOI: https://doi.org/10.1109/COSITE60233.2023.10250039
- Noviandy, T. R., Maulana, A., Idroes, G. M., Suhendra, R., Afidh, R. P. F., and Idroes, R. (2024). An Explainable Multi-Model Stacked Classifier Approach for Predicting Hepatitis C Drug Candidates, Sci, Vol. 6, No. 4, 81. doi:10.3390/sci6040081. DOI: https://doi.org/10.3390/sci6040081
- Dwivedi, R., Dave, D., Naik, H., Singhal, S., Omer, R., Patel, P., Qian, B., Wen, Z., Shah, T., Morgan, G., and Ranjan, R. (2023). Explainable AI (XAI): Core Ideas, Techniques, and Solutions, ACM Computing Surveys, Vol. 55, No. 9, 1–33. doi:10.1145/3561048. DOI: https://doi.org/10.1145/3561048
- Noviandy, T. R., Maulana, A., Irvanizam, I., Idroes, G. M., Maulydia, N. B., Tallei, T. E., Subianto, M., and Idroes, R. (2025). Interpretable Machine Learning Approach to Predict Hepatitis C Virus NS5B Inhibitor Activity Using Voting-Based LightGBM and SHAP, Intelligent Systems with Applications, Vol. 25, 200481. doi:10.1016/j.iswa.2025.200481. DOI: https://doi.org/10.1016/j.iswa.2025.200481
- Gaulton, A., Bellis, L. J., Bento, A. P., Chambers, J., Davies, M., Hersey, A., Light, Y., McGlinchey, S., Michalovich, D., Al-Lazikani, B., and Overington, J. P. (2012). ChEMBL: A Large-Scale Bioactivity Database for Drug Discovery, Nucleic Acids Research, Vol. 40, No. D1, D1100–D1107. doi:10.1093/nar/gkr777. DOI: https://doi.org/10.1093/nar/gkr777
- Noviandy, T. R., Idroes, G. M., and Hardi, I. (2024). Machine Learning Approach to Predict AXL Kinase Inhibitor Activity for Cancer Drug Discovery Using XGBoost and Bayesian Optimization, Journal of Soft Computing and Data Mining, Vol. 5, No. 1, 46–56. DOI: https://doi.org/10.30880/jscdm.2024.05.01.004
- Grisoni, F., Consonni, V., and Todeschini, R. (2018). Impact of Molecular Descriptors on Computational Models, 171–209. doi:10.1007/978-1-4939-8639-2_5. DOI: https://doi.org/10.1007/978-1-4939-8639-2_5
- Moriwaki, H., Tian, Y. S., Kawashita, N., and Takagi, T. (2018). Mordred: A Molecular Descriptor Calculator, Journal of Cheminformatics, Vol. 10, No. 1, 1–14. doi:10.1186/s13321-018-0258-y. DOI: https://doi.org/10.1186/s13321-018-0258-y
- Tropsha, A. (2010). Best Practices for QSAR Model Development, Validation, and Exploitation, Molecular Informatics, Vol. 29, Nos. 6–7, 476–488. doi:10.1002/minf.201000061. DOI: https://doi.org/10.1002/minf.201000061
- Joseph, V. R. (2022). Optimal Ratio for Data Splitting, Statistical Analysis and Data Mining: The ASA Data Science Journal, Vol. 15, No. 4, 531–538. doi:10.1002/sam.11583. DOI: https://doi.org/10.1002/sam.11583
- Suhendra, R., Husdayanti, N., Suryadi, S., Juliwardi, I., Sanusi, S., Ridho, A., Ardiansyah, M., Murhaban, M., and Ikhsan, I. (2023). Cardiovascular Disease Prediction Using Gradient Boosting Classifier, Infolitika Journal of Data Science, Vol. 1, No. 2, 56–62. doi:10.60084/ijds.v1i2.131. DOI: https://doi.org/10.60084/ijds.v1i2.131
- Singh, R., Ganeshpurkar, A., Ghosh, P., Pokle, A. V., Kumar, D., Singh, R. bhushan, Singh, S. K., and Kumar, A. (2021). Classification of Beta‐site Amyloid Precursor Protein Cleaving Enzyme 1 Inhibitors by Using Machine Learning Methods, Chemical Biology & Drug Design, Vol. 98, No. 6, 1079–1097. doi:10.1111/cbdd.13965. DOI: https://doi.org/10.1111/cbdd.13965
- Noviandy, T. R., Idroes, G. M., Hardi, I., Afjal, M., and Ray, S. (2024). A Model-Agnostic Interpretability Approach to Predicting Customer Churn in the Telecommunications Industry, Infolitika Journal of Data Science, Vol. 2, No. 1, 34–44. doi:10.60084/ijds.v2i1.199. DOI: https://doi.org/10.60084/ijds.v2i1.199
- Maulana, A., Afidh, R. P. F., Maulydia, N. B., Idroes, G. M., and Rahimah, S. (2024). Predicting Obesity Levels with High Accuracy: Insights from a CatBoost Machine Learning Model, Infolitika Journal of Data Science, Vol. 2, No. 1, 17–27. doi:10.60084/ijds.v2i1.195. DOI: https://doi.org/10.60084/ijds.v2i1.195
- Hidayat, T., Hadinata, E., Damanik, I. S., Vikki, Z., and Irvanizam, I. (2023). Implementation of Hybrid CNN-XGBoost Method for Leukemia Detection Problem, Infolitika Journal of Data Science, Vol. 1, No. 1, 15–21. doi:10.60084/ijds.v1i1.87. DOI: https://doi.org/10.60084/ijds.v1i1.87
- Ahsan, M. M., Luna, S. A., and Siddique, Z. (2022). Machine-Learning-Based Disease Diagnosis: A Comprehensive Review, Healthcare, Vol. 10, No. 3, 541. doi:10.3390/healthcare10030541. DOI: https://doi.org/10.3390/healthcare10030541
- Noviandy, T. R., Idroes, G. M., Syukri, M., and Idroes, R. (2024). Interpretable Machine Learning for Chronic Kidney Disease Diagnosis: A Gaussian Processes Approach, Indonesian Journal of Case Reports, Vol. 2, No. 1, 24–32. doi:10.60084/ijcr.v2i1.204. DOI: https://doi.org/10.60084/ijcr.v2i1.204
- Tharwat, A. (2021). Classification Assessment Methods, Applied Computing and Informatics, Vol. 17, No. 1, 168–192. doi:10.1016/j.aci.2018.08.003. DOI: https://doi.org/10.1016/j.aci.2018.08.003
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Teuku Rizky Noviandy, Rinaldi Idroes

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.




















