Infolitika Journal of Data Science https://heca-analitika.com/ijds <p><strong>Infolitika Journal of Data Science </strong>is a peer-reviewed international scientific publication dedicated to showcasing exceptional original research articles and review papers in the field of data science. Infolitika Journal of Data Science centers its focus on fostering interdisciplinary research endeavors that bridge scientific and technological advancements with real-world applications and their societal implications. The journal maintains a biannual publication schedule (May and November).</p> <p>Infolitika Journal of Data Science cordially invites submissions from a diverse array of researchers, practitioners, and scholars worldwide. The journal enthusiastically encourages the submission of pioneering research that unveils novel insights and propels the data science field forward. With an unwavering commitment to excellence, pertinence, and influence, Infolitika Journal of Data Science is devoted to disseminating articles that not only uphold the highest quality standards but also facilitate knowledge dissemination and collaboration among the global research community.</p> Heca Sentra Analitika en-US Infolitika Journal of Data Science 3025-8618 Comparison of Spatial Interpolation Methods: Inverse Distance Weighted and Kriging for Earthquake Intensity Mapping in Aceh, Indonesia https://heca-analitika.com/ijds/article/view/347 <p>Aceh Province, located in the Sumatra megathrust zone of Indonesia, is one of the most seismically active regions in Southeast Asia. Understanding the spatial distribution of earthquake magnitudes is essential for disaster mitigation and risk management. This study compares two spatial interpolation methods Inverse Distance Weighted (IDW) and Kriging to determine the most accurate approach for mapping earthquake intensity in Aceh Province. A total of 2,255 earthquake events with magnitudes of 2.5 M and above, recorded between 1990 and 2024 by the United States Geological Survey (USGS), were analyzed. IDW was tested using five power parameters (p = 1–5), while Kriging applied three semivariogram models (spherical, exponential, and Gaussian). The interpolation accuracy was assessed through Root Mean Square Error (RMSE), Mean Square Error (MSE), and Mean Absolute Percentage Error (MAPE). Results indicated that Kriging with the exponential semivariogram achieved the highest accuracy, with RMSE = 0.0848, MSE = 0.0072, and MAPE = 1.14%, outperforming IDW (RMSE = 0.2288, MSE = 0.0523, MAPE = 1.24%). The Kriging model effectively represented the gradual spatial decay of seismic energy, identifying Aceh Singkil and northern Simeulue as the most earthquake-prone zones, consistent with regional tectonic patterns. These findings confirm that incorporating spatial autocorrelation enhances interpolation accuracy and geophysical interpretation. The study establishes Kriging as a reliable tool for seismic hazard mapping and provides valuable insights for disaster preparedness, infrastructure planning, and future geostatistical applications in earthquake risk assessment.</p> Latifah Rahayu Cut Chairilla Yolanda Utami Rahmatul Fauzi Novi Reandy Sasmita Copyright (c) 2025 Latifah Rahayu, Cut Chairilla Yolanda Utami, Rahmatul Fauzi, Novi Reandy Sasmita https://creativecommons.org/licenses/by-nc/4.0 2025-11-28 2025-11-28 3 2 50 60 10.60084/ijds.v3i2.347 An Interpretable Machine Learning Framework for Predicting Advanced Tumor Stages https://heca-analitika.com/ijds/article/view/364 <p>Accurate identification of advanced tumor stages is essential for timely clinical decision-making and personalized treatment planning. This study proposes an explainable ensemble learning framework for predicting advanced tumor stage using a dataset containing 10,000 samples with 18 clinical and radiological features. Four machine learning models, namely Logistic Regression, Naïve Bayes, AdaBoost, and LightGBM, were evaluated using stratified train–test splits along with standard performance metrics. LightGBM achieved the highest performance, with an accuracy of 86.05% and an F1-score of 76.61%, outperforming linear and probabilistic classifiers. ROC–AUC and precision–recall analyses further confirmed the superior discriminative ability of ensemble methods. SHAP explainability techniques highlighted mitotic count, Ki-67 index, enhancement, and necrosis as the most influential predictors of advanced stage. The proposed framework demonstrates strong predictive capability and provides clinically interpretable insights, underscoring its potential as a decision-support tool in oncological diagnostics. Future work will involve external validation and integration of additional multimodal data to enhance generalizability.</p> Teuku Rizky Noviandy Mohsina Patwekar Faheem Patwekar Rinaldi Idroes Copyright (c) 2025 Teuku Rizky Noviandy, Mohsina Patwekar, Faheem Patwekar, Rinaldi Idroes https://creativecommons.org/licenses/by-nc/4.0 2025-11-29 2025-11-29 3 2 61 69 10.60084/ijds.v3i2.364 Enhanced Thyroid Disorder Classification Through XGBoost-Based Machine Learning Techniques https://heca-analitika.com/ijds/article/view/361 <p>Thyroid disorders are common endocrine conditions whose diagnosis often requires integrating multiple clinical and laboratory indicators. This study proposes a machine learning framework for multiclass classification of thyroid diseases using XGBoost combined with an automated preprocessing and feature-engineering pipeline. A dataset of 9,167 patient records and 30 clinical and biochemical features was processed using a structured pipeline that included imputation, encoding, scaling, and hyperparameter optimization with RandomizedSearchCV and GridSearchCV. The optimized XGBoost model achieved 95.20% test accuracy, a high weighted F1-score (0.94), and consistent cross-validated performance. Classification results showed excellent discrimination for major thyroid conditions and reliable identification of healthy individuals. Feature importance analysis revealed that TBG-related measurements, thyroxine therapy status, and key hormone indices (TSH, TT4, FTI) were the most influential predictors. Overall, the findings demonstrate that the proposed XGBoost-based framework provides accurate and robust support for multiclass thyroid disease diagnosis and can serve as a practical foundation for clinical decision-support applications.</p> Aga Maulana Copyright (c) 2025 Aga Maulana https://creativecommons.org/licenses/by-nc/4.0 2025-11-30 2025-11-30 3 2 70 84 10.60084/ijds.v3i2.361 A Convolutional Neural Network Model for Mushroom Toxicity Recognition https://heca-analitika.com/ijds/article/view/359 <p>Mushroom poisoning remains a public health concern, often caused by misidentifying toxic species that visually resemble edible ones. This study investigates the feasibility of using a Convolutional Neural Network (CNN) to classify five mushroom species, <em>Amanita caesarea</em>, <em>Amanita phalloides</em>, <em>Cantharellus cibarius</em>, <em>Omphalotus olearius</em>, and <em>Volvariella volvacea </em>into toxic and non-toxic categories based on image data. A dataset of 137 images was collected and preprocessed through resizing, normalization, and data augmentation. A modified AlexNet-based CNN was trained and evaluated using accuracy, precision, recall, and F1-score. The best-performing model achieved a validation accuracy of 0.40, indicating limited discriminative capability. These findings highlight that the dataset size is insufficient for training a CNN from scratch and that the model cannot reliably distinguish species with subtle morphological differences. The study concludes that larger datasets, improved image quality, and transfer learning approaches are essential for achieving practical and deployable mushroom classification performance.</p> Irvanizam Irvanizam Muhammad Subianto Muhammad Salsabila Jamil Copyright (c) 2025 Irvanizam Irvanizam, Muhammad Subianto, Muhammad Salsabila Jamil https://creativecommons.org/licenses/by-nc/4.0 2025-11-30 2025-11-30 3 2 85 94 10.60084/ijds.v3i2.359 Assessing the Performance of Ensemble and Regularized Models for Daily Rainfall Forecasting in Singapore https://heca-analitika.com/ijds/article/view/360 <p>This study benchmarks ensemble and regularized machine learning models for daily rainfall forecasting using meteorological data from forty-four observation stations across Singapore. The country’s highly variable tropical climate and frequent short-duration rainfall events pose major challenges for urban flood mitigation and operational forecasting. To address this, three algorithms—Lasso Regression, XGBoost Regression, and Gradient Boosting Regression—were developed and evaluated through a systematic comparison of predictive performance. Each model was trained using data from 1980–2023 and validated on independent observations from 2024–2025. The input variables included sub-hourly rainfall intensity, temperature, and wind-related parameters processed through a standardized data-cleaning and imputation pipeline. Results show that XGBoost achieved the most consistent and accurate predictions, with superior performance under both normal and heavy rainfall conditions. Statistical tests confirmed that the improvement was significant compared to Lasso and Gradient Boosting. These findings demonstrate the effectiveness of ensemble-based approaches for enhancing the reliability of data-driven rainfall forecasting in tropical urban environments and support their integration into early warning and hydrological risk management systems.</p> Musliadi Musliadi Muhammad Zulkarnaini Asalul Musaffa Yolanda Yolanda Copyright (c) 2025 Musliadi Musliadi, Muhammad Zulkarnaini, Asalul Musaffa, Yolanda Yolanda https://creativecommons.org/licenses/by-nc/4.0 2025-11-30 2025-11-30 3 2 95 102 10.60084/ijds.v3i2.360