Comparative Analysis of Ensemble Machine Learning Models for QSAR-Based Prediction of Anticoagulant Activity in Thrombotic Disorders

Authors

  • Teuku Rizky Noviandy Department of Information Systems, Faculty of Engineering, Universitas Abulyatama, Aceh Besar 23372, Indonesia
  • Rahmat Sufri Department of Information Systems, Faculty of Engineering, Universitas Abulyatama, Aceh Besar 23372, Indonesia
  • Ryan Setiawan Department of Information Systems, Faculty of Engineering, Universitas Abulyatama, Aceh Besar 23372, Indonesia
  • Anisah Anisah Department of Information Systems, Faculty of Engineering, Universitas Abulyatama, Aceh Besar 23372, Indonesia

DOI:

https://doi.org/10.60084/hjas.v4i1.393

Keywords:

Thrombin inhibitors, Molecular descriptors, Hyperparameter tuning

Abstract

Thrombotic disorders remain a major cause of global morbidity and mortality, with dysregulation of blood coagulation pathways playing a central role in disease progression. In particular, Thrombin is a key therapeutic target for anticoagulant drug development, making accurate prediction of inhibitory activity highly relevant for accelerating discovery efforts. Despite advances in computational drug discovery, there is still a need for systematic evaluation of machine learning approaches for QSAR-based prediction of anticoagulant activity. Many existing studies focus on single models or lack consistent comparison frameworks, limiting insights into the relative performance of different ensemble techniques. To address this gap, this study explores the application of multiple ensemble machine learning methods, including Random Forest, XGBoost, Gradient Boosting, and Extra Trees, combined with hyperparameter optimization using random search. The main objective of this work is to conduct a comparative analysis of these ensemble models to predict pIC50 values for thrombin inhibitors using molecular descriptors derived from chemical structures. The results show that the Extra Trees model achieved the best overall performance, with an R2 of 0.697, RMSE of 0.851, and MAE of 0.615 after tuning. Additionally, Gradient Boosting and XGBoost demonstrated significant improvement following hyperparameter optimization, highlighting the importance of model tuning in QSAR tasks. Overall, the study confirms that ensemble learning methods yield reliable, accurate predictions of anticoagulant activity, with Extra Trees emerging as the most effective approach for this dataset.

Downloads

Download data is not yet available.

References

  1. Oleksiuk-Bójko, M., and Lisowska, A. (2023). Venous Thromboembolism: Why Is It Still a Significant Health Problem?, Advances in Medical Sciences, Vol. 68, No. 1, 10–20. doi:10.1016/j.advms.2022.10.002.
  2. Lutsey, P. L., and Zakai, N. A. (2023). Epidemiology and Prevention of Venous Thromboembolism, Nature Reviews Cardiology, Vol. 20, No. 4, 248–262. doi:10.1038/s41569-022-00787-6.
  3. Wilhelm, G., Mertowska, P., Mertowski, S., Przysucha, A., Strużyna, J., Grywalska, E., and Torres, K. (2023). The Crossroads of the Coagulation System and the Immune System: Interactions and Connections, International Journal of Molecular Sciences, Vol. 24, No. 16, 12563. doi:10.3390/ijms241612563.
  4. Al-Koussa, H., AlZaim, I., and El-Sabban, M. E. (2022). Pathophysiology of Coagulation and Emerging Roles for Extracellular Vesicles in Coagulation Cascades and Disorders, Journal of Clinical Medicine, Vol. 11, No. 16, 4932. doi:10.3390/jcm11164932.
  5. Al-Amer, O. M. (2022). The Role of Thrombin in Haemostasis, Blood Coagulation & Fibrinolysis, Vol. 33, No. 3, 145–148. doi:10.1097/MBC.0000000000001130.
  6. Mackman, N., Bergmeier, W., Stouffer, G. A., and Weitz, J. I. (2020). Therapeutic Strategies for Thrombosis: New Targets and Approaches, Nature Reviews Drug Discovery, Vol. 19, No. 5, 333–352. doi:10.1038/s41573-020-0061-0.
  7. Jannati, S., Patnaik, R., and Banerjee, Y. (2024). Beyond Anticoagulation: A Comprehensive Review of Non-Vitamin K Oral Anticoagulants (NOACs) in Inflammation and Protease-Activated Receptor Signaling, International Journal of Molecular Sciences, Vol. 25, No. 16, 8727. doi:10.3390/ijms25168727.
  8. Sangam, S., and Gudi, S. K. (2025). The Role of Digital Health in the Management of Warfarin Therapy, Drugs & Therapy Perspectives, Vol. 41, No. 2, 63–74. doi:10.1007/s40267-024-01134-0.
  9. An, Q., Huang, L., Wang, C., Wang, D., and Tu, Y. (2025). New Strategies to Enhance the Efficiency and Precision of Drug Discovery, Frontiers in Pharmacology, Vol. 16. doi:10.3389/fphar.2025.1550158.
  10. Niazi, S. K., and Mariam, Z. (2023). Recent Advances in Machine-Learning-Based Chemoinformatics: A Comprehensive Review, International Journal of Molecular Sciences, Vol. 24, No. 14, 11488. doi:10.3390/ijms241411488.
  11. Altememi, M. A., Favaloro, E. J., Islam, M. Z., and Santhakumar, A. B. (2026). Artificial Intelligence and Machine Learning in Thrombosis and Hemostasis: A Scoping Review of Clinical and Laboratory Applications, Challenges, and Future Directions, Clinical Chemistry and Laboratory Medicine (CCLM), Vol. 64, No. 4, 767–780. doi:10.1515/cclm-2025-1450.
  12. De Borja, J. R., and Cabrera, H. S. (2024). In Silico Drug Screening for Hepatitis C Virus Using QSAR-ML and Molecular Docking with Rho-Associated Protein Kinase 1 (ROCK1) Inhibitors, Computation, Vol. 12, No. 9, 175. doi:10.3390/computation12090175.
  13. Hammoudi, N.-E.-H., Sobhi, W., Attoui, A., Lemaoui, T., Erto, A., and Benguerba, Y. (2021). In Silico Drug Discovery of Acetylcholinesterase and Butyrylcholinesterase Enzymes Inhibitors Based on Quantitative Structure-Activity Relationship (QSAR) and Drug-Likeness Evaluation, Journal of Molecular Structure, Vol. 1229, 129845. doi:10.1016/j.molstruc.2020.129845.
  14. Noviandy, T. R., Idroes, G. M., Mohd Fauzi, F., and Idroes, R. (2024). Application of Ensemble Machine Learning Methods for QSAR Classification of Leukotriene A4 Hydrolase Inhibitors in Drug Discovery, Malacca Pharmaceutics, Vol. 2, No. 2, 68–78. doi:10.60084/mp.v2i2.217.
  15. Noviandy, T. R., Maulana, A., Idroes, G. M., Suhendra, R., Afidh, R. P. F., and Idroes, R. (2024). An Explainable Multi-Model Stacked Classifier Approach for Predicting Hepatitis C Drug Candidates, Sci, Vol. 6, No. 4, 81. doi:10.3390/sci6040081.
  16. Gaulton, A., Bellis, L. J., Bento, A. P., Chambers, J., Davies, M., Hersey, A., Light, Y., McGlinchey, S., Michalovich, D., Al-Lazikani, B., and Overington, J. P. (2012). ChEMBL: A Large-Scale Bioactivity Database for Drug Discovery, Nucleic Acids Research, Vol. 40, No. D1, D1100–D1107. doi:10.1093/nar/gkr777.
  17. Thakur, A., Kumar, A., Sharma, V. kumar, and Mehta, V. (2022). PIC50: An Open Source Tool for Interconversion of PIC50 Values and IC50 for Efficient Data Representation and Analysis, BioRxiv, 2010–2022.
  18. Mauri, A., Consonni, V., and Todeschini, R. (2017). Molecular Descriptors, Handbook of Computational Chemistry, Springer International Publishing, Cham, 2065–2093. doi:10.1007/978-3-319-27282-5_51.
  19. Moriwaki, H., Tian, Y. S., Kawashita, N., and Takagi, T. (2018). Mordred: A Molecular Descriptor Calculator, Journal of Cheminformatics, Vol. 10, No. 1, 1–14. doi:10.1186/s13321-018-0258-y.
  20. Goodarzi, M., Dejaegher, B., and Heyden, Y. Vander. (2012). Feature Selection Methods in QSAR Studies, Journal of AOAC International, Vol. 95, No. 3, 636–651. doi:10.5740/jaoacint.SGE_Goodarzi.
  21. Noviandy, T. R., Idroes, G. M., Tallei, T. E., Handayani, D., and Idroes, R. (2024). QSAR Modeling for Predicting Beta-Secretase 1 Inhibitory Activity in Alzheimer’s Disease with Support Vector Regression, Malacca Pharmaceutics, Vol. 2, No. 2, 79–85. doi:10.60084/mp.v2i2.226.
  22. Olayemi, Olanrewaju, S. (2020). Effects of Multicollinearity and Correlation between the Error Terms on Some Estimators in a System of Regression Equations, Global Journal of Science Frontier Research, Vol. 1, No. 1, 77–94. doi:10.34257/GJSFRFVOL20IS4PG77.
  23. Bergstra, J., and Bengio, Y. (2012). Random Search for Hyper-Parameter Optimization., Journal of Machine Learning Research, Vol. 13, No. 2.
  24. Kalyankar, D. S., Bhagat, C. G., Kadu, A. D., Tambade, A. P., and Dhoran, K. S. (2024). AI-Driven Insights: Paving the Path to Next-Generation Therapeutics, International Journal of Advanced Research in Science, Communication and Technology, 372–378. doi:10.48175/IJARSCT-22854.

Downloads

Published

2026-03-31

How to Cite

Noviandy, T. R., Sufri, R., Setiawan, R. and Anisah, A. (2026) “Comparative Analysis of Ensemble Machine Learning Models for QSAR-Based Prediction of Anticoagulant Activity in Thrombotic Disorders”, Heca Journal of Applied Sciences, 4(1), pp. 88–95. doi: 10.60084/hjas.v4i1.393.

Issue

Section

Articles