Interpretable Machine Learning QSAR Models for Classification and Screening of VEGFR-2 Inhibitors in Anticancer Drug Discovery

Authors

  • Teuku Rizky Noviandy Department of Information Systems, Faculty of Engineering, Universitas Abulyatama, Aceh Besar 23372, Indonesia
  • Rinaldi Idroes School of Mathematics and Applied Sciences, Universitas Syiah Kuala, Banda Aceh 23111, Indonesia

DOI:

https://doi.org/10.60084/mp.v3i2.339

Keywords:

VEGFR-2 inhibitors, Machine learning, QSAR modeling, Drug discovery

Abstract

Cancer remains a major global health burden, with angiogenesis playing a central role in tumor growth and progression. Vascular Endothelial Growth Factor Receptor-2 (VEGFR-2) is a key mediator of angiogenesis and an attractive therapeutic target, but existing inhibitors are limited by reduced efficacy, toxicity, and resistance, creating a need for more effective predictive models in drug discovery. In this study, an interpretable machine learning based QSAR approach was developed using a curated dataset of 10,221 VEGFR-2 inhibitors from ChEMBL represented by 164 molecular descriptors. Four algorithms, kNN, AdaBoost, Random Forest, and XGBoost, were compared, and XGBoost achieved the best results with an accuracy of 83.67 percent, sensitivity of 91.38 percent, specificity of 71.73 percent, F1-score of 87.17 percent, and AUC of 0.9009. Model interpretation with LIME identified molecular descriptors related to hydrogen bonding, electrostatics, and lipophilicity as key contributors to activity. These results indicate that interpretable ensemble models can combine strong predictive performance with mechanistic insights, supporting rational design and optimization of novel VEGFR-2 inhibitors for anticancer therapy.

Downloads

Download data is not yet available.

References

  1. Lin, H.-Y., and Park, J. Y. (2023). Epidemiology of Cancer, Anesthesia for Oncological Surgery, Springer International Publishing, Cham, 11–16. doi:10.1007/978-3-031-50977-3_2. DOI: https://doi.org/10.1007/978-3-031-50977-3_2
  2. Liu, B., Zhou, H., Tan, L., Siu, K. T. H., and Guan, X.-Y. (2024). Exploring Treatment Options in Cancer: Tumor Treatment Strategies, Signal Transduction and Targeted Therapy, Vol. 9, No. 1, 175. doi:10.1038/s41392-024-01856-7. DOI: https://doi.org/10.1038/s41392-024-01856-7
  3. Bertolaccini, L., Casiraghi, M., Uslenghi, C., Maiorca, S., and Spaggiari, L. (2024). Recent Advances in Lung Cancer Research: Unravelling the Future of Treatment, Updates in Surgery, Vol. 76, No. 6, 2129–2140. doi:10.1007/s13304-024-01841-3. DOI: https://doi.org/10.1007/s13304-024-01841-3
  4. Fitzgerald, R. C., Antoniou, A. C., Fruk, L., and Rosenfeld, N. (2022). The Future of Early Cancer Detection, Nature Medicine, Vol. 28, No. 4, 666–677. doi:10.1038/s41591-022-01746-x. DOI: https://doi.org/10.1038/s41591-022-01746-x
  5. Carugo, A., and Draetta, G. F. (2019). Academic Discovery of Anticancer Drugs: Historic and Future Perspectives, Annual Review of Cancer Biology, Vol. 3, No. 1, 385–408. doi:10.1146/annurev-cancerbio-030518-055645. DOI: https://doi.org/10.1146/annurev-cancerbio-030518-055645
  6. Liu, Z.-L., Chen, H.-H., Zheng, L.-L., Sun, L.-P., and Shi, L. (2023). Angiogenic Signaling Pathways and Anti-Angiogenic Therapy for Cancer, Signal Transduction and Targeted Therapy, Vol. 8, No. 1, 198. doi:10.1038/s41392-023-01460-1. DOI: https://doi.org/10.1038/s41392-023-01460-1
  7. Shah, A. A., Kamal, M. A., and Akhtar, S. (2021). Tumor Angiogenesis and VEGFR-2: Mechanism, Pathways and Current Biological Therapeutic Interventions, Current Drug Metabolism, Vol. 22, No. 1, 50–59. doi:10.2174/1389200221666201019143252. DOI: https://doi.org/10.2174/1389200221666201019143252
  8. Dhudum, R., Ganeshpurkar, A., and Pawar, A. (2024). Revolutionizing Drug Discovery: A Comprehensive Review of AI Applications, Drugs and Drug Candidates, Vol. 3, No. 1, 148–171. doi:10.3390/ddc3010009. DOI: https://doi.org/10.3390/ddc3010009
  9. Choudhuri, S., Yendluri, M., Poddar, S., Li, A., Mallick, K., Mallik, S., and Ghosh, B. (2023). Recent Advancements in Computational Drug Design Algorithms through Machine Learning and Optimization, Kinases and Phosphatases, Vol. 1, No. 2, 117–140. doi:10.3390/kinasesphosphatases1020008. DOI: https://doi.org/10.3390/kinasesphosphatases1020008
  10. Noviandy, T. R., Nisa, K., Idroes, G. M., Hardi, I., and Sasmita, N. R. (2024). Classifying Beta-Secretase 1 Inhibitor Activity for Alzheimer’s Drug Discovery with LightGBM, Journal of Computing Theories and Applications, Vol. 2, No. 2, 138–147. doi:10.62411/jcta.10129. DOI: https://doi.org/10.62411/jcta.10129
  11. Noviandy, T. R., Maulana, A., Idroes, G. M., Emran, T. B., Tallei, T. E., Helwani, Z., and Idroes, R. (2023). Ensemble Machine Learning Approach for Quantitative Structure Activity Relationship Based Drug Discovery: A Review, Infolitika Journal of Data Science, Vol. 1, No. 1, 32–41. doi:10.60084/ijds.v1i1.91. DOI: https://doi.org/10.60084/ijds.v1i1.91
  12. Supriatna, D. J. I., Saputra, H., and Hasan, K. (2023). Enhancing the Red Wine Quality Classification Using Ensemble Voting Classifiers, Infolitika Journal of Data Science, Vol. 1, No. 2, 42–47. doi:10.60084/ijds.v1i2.95. DOI: https://doi.org/10.60084/ijds.v1i2.95
  13. Noviandy, T. R., Maulana, A., Idroes, G. M., Irvanizam, I., Subianto, M., and Idroes, R. (2023). QSAR-Based Stacked Ensemble Classifier for Hepatitis C NS5B Inhibitor Prediction, 2023 2nd International Conference on Computer System, Information Technology, and Electrical Engineering (COSITE), IEEE, 220–225. doi:10.1109/COSITE60233.2023.10250039. DOI: https://doi.org/10.1109/COSITE60233.2023.10250039
  14. Noviandy, T. R., Maulana, A., Idroes, G. M., Suhendra, R., Afidh, R. P. F., and Idroes, R. (2024). An Explainable Multi-Model Stacked Classifier Approach for Predicting Hepatitis C Drug Candidates, Sci, Vol. 6, No. 4, 81. doi:10.3390/sci6040081. DOI: https://doi.org/10.3390/sci6040081
  15. Dwivedi, R., Dave, D., Naik, H., Singhal, S., Omer, R., Patel, P., Qian, B., Wen, Z., Shah, T., Morgan, G., and Ranjan, R. (2023). Explainable AI (XAI): Core Ideas, Techniques, and Solutions, ACM Computing Surveys, Vol. 55, No. 9, 1–33. doi:10.1145/3561048. DOI: https://doi.org/10.1145/3561048
  16. Noviandy, T. R., Maulana, A., Irvanizam, I., Idroes, G. M., Maulydia, N. B., Tallei, T. E., Subianto, M., and Idroes, R. (2025). Interpretable Machine Learning Approach to Predict Hepatitis C Virus NS5B Inhibitor Activity Using Voting-Based LightGBM and SHAP, Intelligent Systems with Applications, Vol. 25, 200481. doi:10.1016/j.iswa.2025.200481. DOI: https://doi.org/10.1016/j.iswa.2025.200481
  17. Gaulton, A., Bellis, L. J., Bento, A. P., Chambers, J., Davies, M., Hersey, A., Light, Y., McGlinchey, S., Michalovich, D., Al-Lazikani, B., and Overington, J. P. (2012). ChEMBL: A Large-Scale Bioactivity Database for Drug Discovery, Nucleic Acids Research, Vol. 40, No. D1, D1100–D1107. doi:10.1093/nar/gkr777. DOI: https://doi.org/10.1093/nar/gkr777
  18. Noviandy, T. R., Idroes, G. M., and Hardi, I. (2024). Machine Learning Approach to Predict AXL Kinase Inhibitor Activity for Cancer Drug Discovery Using XGBoost and Bayesian Optimization, Journal of Soft Computing and Data Mining, Vol. 5, No. 1, 46–56. DOI: https://doi.org/10.30880/jscdm.2024.05.01.004
  19. Grisoni, F., Consonni, V., and Todeschini, R. (2018). Impact of Molecular Descriptors on Computational Models, 171–209. doi:10.1007/978-1-4939-8639-2_5. DOI: https://doi.org/10.1007/978-1-4939-8639-2_5
  20. Moriwaki, H., Tian, Y. S., Kawashita, N., and Takagi, T. (2018). Mordred: A Molecular Descriptor Calculator, Journal of Cheminformatics, Vol. 10, No. 1, 1–14. doi:10.1186/s13321-018-0258-y. DOI: https://doi.org/10.1186/s13321-018-0258-y
  21. Tropsha, A. (2010). Best Practices for QSAR Model Development, Validation, and Exploitation, Molecular Informatics, Vol. 29, Nos. 6–7, 476–488. doi:10.1002/minf.201000061. DOI: https://doi.org/10.1002/minf.201000061
  22. Joseph, V. R. (2022). Optimal Ratio for Data Splitting, Statistical Analysis and Data Mining: The ASA Data Science Journal, Vol. 15, No. 4, 531–538. doi:10.1002/sam.11583. DOI: https://doi.org/10.1002/sam.11583
  23. Suhendra, R., Husdayanti, N., Suryadi, S., Juliwardi, I., Sanusi, S., Ridho, A., Ardiansyah, M., Murhaban, M., and Ikhsan, I. (2023). Cardiovascular Disease Prediction Using Gradient Boosting Classifier, Infolitika Journal of Data Science, Vol. 1, No. 2, 56–62. doi:10.60084/ijds.v1i2.131. DOI: https://doi.org/10.60084/ijds.v1i2.131
  24. Singh, R., Ganeshpurkar, A., Ghosh, P., Pokle, A. V., Kumar, D., Singh, R. bhushan, Singh, S. K., and Kumar, A. (2021). Classification of Beta‐site Amyloid Precursor Protein Cleaving Enzyme 1 Inhibitors by Using Machine Learning Methods, Chemical Biology & Drug Design, Vol. 98, No. 6, 1079–1097. doi:10.1111/cbdd.13965. DOI: https://doi.org/10.1111/cbdd.13965
  25. Noviandy, T. R., Idroes, G. M., Hardi, I., Afjal, M., and Ray, S. (2024). A Model-Agnostic Interpretability Approach to Predicting Customer Churn in the Telecommunications Industry, Infolitika Journal of Data Science, Vol. 2, No. 1, 34–44. doi:10.60084/ijds.v2i1.199. DOI: https://doi.org/10.60084/ijds.v2i1.199
  26. Maulana, A., Afidh, R. P. F., Maulydia, N. B., Idroes, G. M., and Rahimah, S. (2024). Predicting Obesity Levels with High Accuracy: Insights from a CatBoost Machine Learning Model, Infolitika Journal of Data Science, Vol. 2, No. 1, 17–27. doi:10.60084/ijds.v2i1.195. DOI: https://doi.org/10.60084/ijds.v2i1.195
  27. Hidayat, T., Hadinata, E., Damanik, I. S., Vikki, Z., and Irvanizam, I. (2023). Implementation of Hybrid CNN-XGBoost Method for Leukemia Detection Problem, Infolitika Journal of Data Science, Vol. 1, No. 1, 15–21. doi:10.60084/ijds.v1i1.87. DOI: https://doi.org/10.60084/ijds.v1i1.87
  28. Ahsan, M. M., Luna, S. A., and Siddique, Z. (2022). Machine-Learning-Based Disease Diagnosis: A Comprehensive Review, Healthcare, Vol. 10, No. 3, 541. doi:10.3390/healthcare10030541. DOI: https://doi.org/10.3390/healthcare10030541
  29. Noviandy, T. R., Idroes, G. M., Syukri, M., and Idroes, R. (2024). Interpretable Machine Learning for Chronic Kidney Disease Diagnosis: A Gaussian Processes Approach, Indonesian Journal of Case Reports, Vol. 2, No. 1, 24–32. doi:10.60084/ijcr.v2i1.204. DOI: https://doi.org/10.60084/ijcr.v2i1.204
  30. Tharwat, A. (2021). Classification Assessment Methods, Applied Computing and Informatics, Vol. 17, No. 1, 168–192. doi:10.1016/j.aci.2018.08.003. DOI: https://doi.org/10.1016/j.aci.2018.08.003

Downloads

Published

2025-09-28

How to Cite

Noviandy, T. R., & Idroes, R. (2025). Interpretable Machine Learning QSAR Models for Classification and Screening of VEGFR-2 Inhibitors in Anticancer Drug Discovery. Malacca Pharmaceutics, 3(2), 58–66. https://doi.org/10.60084/mp.v3i2.339