Enhanced Thyroid Disorder Classification Through XGBoost-Based Machine Learning Techniques
DOI:
https://doi.org/10.60084/ijds.v3i2.361Keywords:
Endocrine diagnostics, Clinical laboratory analytics, Gradient-boosting classifiers, Decision-support modeling, Medical data preprocessingAbstract
Thyroid disorders are common endocrine conditions whose diagnosis often requires integrating multiple clinical and laboratory indicators. This study proposes a machine learning framework for multiclass classification of thyroid diseases using XGBoost combined with an automated preprocessing and feature-engineering pipeline. A dataset of 9,167 patient records and 30 clinical and biochemical features was processed using a structured pipeline that included imputation, encoding, scaling, and hyperparameter optimization with RandomizedSearchCV and GridSearchCV. The optimized XGBoost model achieved 95.20% test accuracy, a high weighted F1-score (0.94), and consistent cross-validated performance. Classification results showed excellent discrimination for major thyroid conditions and reliable identification of healthy individuals. Feature importance analysis revealed that TBG-related measurements, thyroxine therapy status, and key hormone indices (TSH, TT4, FTI) were the most influential predictors. Overall, the findings demonstrate that the proposed XGBoost-based framework provides accurate and robust support for multiclass thyroid disease diagnosis and can serve as a practical foundation for clinical decision-support applications.
Downloads
References
- Salman, A. G., Mahdi, I. A.-J., Mukhlef, A. K., abd alsattar Mohammad, R., Zaghir, M. S. H., and muatez Wadaa’a, N. (2024). Physiological Aspects of Thyroid Disorders: Anatomy, Hormones, Diagnosis and Management, Current Clinical and Medical Education, Vol. 2, No. 05, 17–32.
- Fernández-García, V., González-Ramos, S., Martín-Sanz, P., Laparra, J. M., and Boscá, L. (2021). Beyond Classic Concepts in Thyroid Homeostasis: Immune System and Microbiota, Molecular and Cellular Endocrinology, Vol. 533, 111333.
- Wang, H., Shang, F., Jiang, X., Li, Z., Li, D., Zhou, C., Pang, B., Kang, L., Liu, B., and Zhao, Z. (2025). Whole Exome Sequencing and Bioinformatics Reveal PMAIP1 and PDGFRL as Immune-Related Gene Markers in Follicular Thyroid Carcinoma, Frontiers in Genetics, Vol. 16, 1509245.
- D’Aurizio, F., Kratzsch, J., Gruson, D., Petranović Ovčariček, P., and Giovanella, L. (2023). Free Thyroxine Measurement in Clinical Practice: How to Optimize Indications, Analytical Procedures, and Interpretation Criteria While Waiting for Global Standardization, Critical Reviews in Clinical Laboratory Sciences, Vol. 60, No. 2, 101–140.
- Croker, E. E., McGrath, S. A., and Rowe, C. W. (2021). Thyroid Disease: Using Diagnostic Tools Effectively, Australian Journal of General Practice, Vol. 50, No. 1/2, 16–21.
- Macvanin, M. T., Gluvic, Z. M., Zaric, B. L., Essack, M., Gao, X., and Isenovic, E. R. (2023). New Biomarkers: Prospect for Diagnosis and Monitoring of Thyroid Disease, Frontiers in Endocrinology, Vol. 14, 1218320.
- Jaiswal, V., and Gurudiwan, P. (2023). Identifying Thyroid Dysfunction Using Standard Laboratory Testings–A Systematic Review, Integrative Biomedical Research, Vol. 7, No. 2, 1–8.
- Toro-Tobon, D., Loor-Torres, R., Duran, M., Fan, J. W., Singh Ospina, N., Wu, Y., and Brito, J. P. (2023). Artificial Intelligence in Thyroidology: A Narrative Review of the Current Applications, Associated Challenges, and Future Directions, Thyroid, Vol. 33, No. 8, 903–917.
- Sharma, V., Cheetham, T., and Wood, C. (2023). Understanding and Interpreting Thyroid Function Tests, Paediatrics and Child Health, Vol. 33, No. 7, 183–188.
- Chutiyami, M., Cutler, N., Sangon, S., Thaweekoon, T., Nintachan, P., Napa, W., Kraithaworn, P., and River, J. (2025). Community-Engaged Mental Health and Wellbeing Initiatives in Under-Resourced Settings: A Scoping Review of Primary Studies, Journal of Primary Care & Community Health, Vol. 16, 21501319251332724.
- Taha, K. (2025). Machine Learning in Biomedical and Health Big Data: A Comprehensive Survey with Empirical and Experimental Insights, Journal of Big Data, Vol. 12, No. 1, 61.
- Asif, S., Wenhui, Y., Ur-Rehman, S.-, Ul-ain, Q.-, Amjad, K., Yueyang, Y., Jinhai, S., and Awais, M. (2025). Advancements and Prospects of Machine Learning in Medical Diagnostics: Unveiling the Future of Diagnostic Precision, Archives of Computational Methods in Engineering, Vol. 32, No. 2, 853–883. doi:10.1007/s11831-024-10148-w.
- Chen, T., and Guestrin, C. (2016). Xgboost: A Scalable Tree Boosting System, Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, 785–794.
- Shaik, N. B., Jongkittinarukorn, K., and Bingi, K. (2024). XGBoost Based Enhanced Predictive Model for Handling Missing Input Parameters: A Case Study on Gas Turbine, Case Studies in Chemical and Environmental Engineering, Vol. 10, 100775. doi:10.1016/j.cscee.2024.100775.
- Quinlan, R. (1986). Thyroid Disease, UCI Machine Learning Repository.
- Jongejan, R. M. S., Meima, M. E., Visser, W. E., Korevaar, T. I. M., van den Berg, S. A. A., Peeters, R. P., and de Rijke, Y. B. (2022). Binding Characteristics of Thyroid Hormone Distributor Proteins to Thyroid Hormone Metabolites, Thyroid, Vol. 32, No. 8, 990–999. doi:10.1089/thy.2021.0588.
- Bagga, A. D., Johnson, B. P., and Zhang, Q. (2023). A Minimal Human Physiologically Based Kinetic Model of Thyroid Hormones and Chemical Disruption of Plasma Thyroid Hormone Binding Proteins, Frontiers in Endocrinology, Vol. 14. doi:10.3389/fendo.2023.1168663.
- Moustakli, E., and Tsonis, O. (2023). Exploring Hormone Therapy Effects on Reproduction and Health in Transgender Individuals, Medicina, Vol. 59, No. 12, 2094. doi:10.3390/medicina59122094.
- Seyedtabib, M., Najafi-Vosough, R., and Kamyari, N. (2024). The Predictive Power of Data: Machine Learning Analysis for Covid-19 Mortality Based on Personal, Clinical, Preclinical, and Laboratory Variables in a Case–Control Study, BMC Infectious Diseases, Vol. 24, No. 1, 411. doi:10.1186/s12879-024-09298-w.
- Li, R., Hao, X., Diao, Y., Yang, L., and Liu, J. (2025). Explainable Machine Learning Models for Colorectal Cancer Prediction Using Clinical Laboratory Data, Cancer Control, Vol. 32. doi:10.1177/10732748251336417.
- Spencer, C. A. (2023). Laboratory Thyroid Tests: A Historical Perspective, Thyroid, Vol. 33, No. 4, 407–419. doi:10.1089/thy.2022.0397.
- Sutradhar, A., Akter, S., Shamrat, F. M. J. M., Ghosh, P., Zhou, X., Idris, M. Y. I. Bin, Ahmed, K., and Moni, M. A. (2024). Advancing Thyroid Care: An Accurate Trustworthy Diagnostics System with Interpretable AI and Hybrid Machine Learning Techniques, Heliyon, Vol. 10, No. 17, e36556. doi:10.1016/j.heliyon.2024.e36556.
- Girwar, S. M., Jabroer, R., Fiocco, M., Sutch, S. P., Numans, M. E., and Bruijnzeels, M. A. (2021). A Systematic Review of Risk Stratification Tools Internationally Used in Primary Care Settings, Health Science Reports, Vol. 4, No. 3. doi:10.1002/hsr2.329.
- Park, D. J., Park, M. W., Lee, H., Kim, Y.-J., Kim, Y., and Park, Y. H. (2021). Development of Machine Learning Model for Diagnostic Disease Prediction Based on Laboratory Tests, Scientific Reports, Vol. 11, No. 1, 7567. doi:10.1038/s41598-021-87171-5.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Aga Maulana

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.




















