Infolitika Journal of Data Science

Comparison of Spatial Interpolation Methods: Inverse Distance Weighted and Kriging for Earthquake Intensity Mapping in Aceh, Indonesia

2025-12-02T22:50:13+07:00

Aceh Province, located in the Sumatra megathrust zone of Indonesia, is one of the most seismically active regions in Southeast Asia. Understanding the spatial distribution of earthquake magnitudes is essential for disaster mitigation and risk management. This study compares two spatial interpolation methods Inverse Distance Weighted (IDW) and Kriging to determine the most accurate approach for mapping earthquake intensity in Aceh Province. A total of 2,255 earthquake events with magnitudes of 2.5 M and above, recorded between 1990 and 2024 by the United States Geological Survey (USGS), were analyzed. IDW was tested using five power parameters (p = 1–5), while Kriging applied three semivariogram models (spherical, exponential, and Gaussian). The interpolation accuracy was assessed through Root Mean Square Error (RMSE), Mean Square Error (MSE), and Mean Absolute Percentage Error (MAPE). Results indicated that Kriging with the exponential semivariogram achieved the highest accuracy, with RMSE = 0.0848, MSE = 0.0072, and MAPE = 1.14%, outperforming IDW (RMSE = 0.2288, MSE = 0.0523, MAPE = 1.24%). The Kriging model effectively represented the gradual spatial decay of seismic energy, identifying Aceh Singkil and northern Simeulue as the most earthquake-prone zones, consistent with regional tectonic patterns. These findings confirm that incorporating spatial autocorrelation enhances interpolation accuracy and geophysical interpretation. The study establishes Kriging as a reliable tool for seismic hazard mapping and provides valuable insights for disaster preparedness, infrastructure planning, and future geostatistical applications in earthquake risk assessment.

An Interpretable Machine Learning Framework for Predicting Advanced Tumor Stages

2025-12-02T22:50:12+07:00

Accurate identification of advanced tumor stages is essential for timely clinical decision-making and personalized treatment planning. This study proposes an explainable ensemble learning framework for predicting advanced tumor stage using a dataset containing 10,000 samples with 18 clinical and radiological features. Four machine learning models, namely Logistic Regression, Naïve Bayes, AdaBoost, and LightGBM, were evaluated using stratified train–test splits along with standard performance metrics. LightGBM achieved the highest performance, with an accuracy of 86.05% and an F1-score of 76.61%, outperforming linear and probabilistic classifiers. ROC–AUC and precision–recall analyses further confirmed the superior discriminative ability of ensemble methods. SHAP explainability techniques highlighted mitotic count, Ki-67 index, enhancement, and necrosis as the most influential predictors of advanced stage. The proposed framework demonstrates strong predictive capability and provides clinically interpretable insights, underscoring its potential as a decision-support tool in oncological diagnostics. Future work will involve external validation and integration of additional multimodal data to enhance generalizability.

Enhanced Thyroid Disorder Classification Through XGBoost-Based Machine Learning Techniques

2025-12-02T22:50:06+07:00

Thyroid disorders are common endocrine conditions whose diagnosis often requires integrating multiple clinical and laboratory indicators. This study proposes a machine learning framework for multiclass classification of thyroid diseases using XGBoost combined with an automated preprocessing and feature-engineering pipeline. A dataset of 9,167 patient records and 30 clinical and biochemical features was processed using a structured pipeline that included imputation, encoding, scaling, and hyperparameter optimization with RandomizedSearchCV and GridSearchCV. The optimized XGBoost model achieved 95.20% test accuracy, a high weighted F1-score (0.94), and consistent cross-validated performance. Classification results showed excellent discrimination for major thyroid conditions and reliable identification of healthy individuals. Feature importance analysis revealed that TBG-related measurements, thyroxine therapy status, and key hormone indices (TSH, TT4, FTI) were the most influential predictors. Overall, the findings demonstrate that the proposed XGBoost-based framework provides accurate and robust support for multiclass thyroid disease diagnosis and can serve as a practical foundation for clinical decision-support applications.

A Convolutional Neural Network Model for Mushroom Toxicity Recognition

2025-12-02T22:50:10+07:00

Mushroom poisoning remains a public health concern, often caused by misidentifying toxic species that visually resemble edible ones. This study investigates the feasibility of using a Convolutional Neural Network (CNN) to classify five mushroom species, Amanita caesarea, Amanita phalloides, Cantharellus cibarius, Omphalotus olearius, and Volvariella volvacea into toxic and non-toxic categories based on image data. A dataset of 137 images was collected and preprocessed through resizing, normalization, and data augmentation. A modified AlexNet-based CNN was trained and evaluated using accuracy, precision, recall, and F1-score. The best-performing model achieved a validation accuracy of 0.40, indicating limited discriminative capability. These findings highlight that the dataset size is insufficient for training a CNN from scratch and that the model cannot reliably distinguish species with subtle morphological differences. The study concludes that larger datasets, improved image quality, and transfer learning approaches are essential for achieving practical and deployable mushroom classification performance.

Assessing the Performance of Ensemble and Regularized Models for Daily Rainfall Forecasting in Singapore

2025-12-02T22:50:08+07:00

This study benchmarks ensemble and regularized machine learning models for daily rainfall forecasting using meteorological data from forty-four observation stations across Singapore. The country’s highly variable tropical climate and frequent short-duration rainfall events pose major challenges for urban flood mitigation and operational forecasting. To address this, three algorithms—Lasso Regression, XGBoost Regression, and Gradient Boosting Regression—were developed and evaluated through a systematic comparison of predictive performance. Each model was trained using data from 1980–2023 and validated on independent observations from 2024–2025. The input variables included sub-hourly rainfall intensity, temperature, and wind-related parameters processed through a standardized data-cleaning and imputation pipeline. Results show that XGBoost achieved the most consistent and accurate predictions, with superior performance under both normal and heavy rainfall conditions. Statistical tests confirmed that the improvement was significant compared to Lasso and Gradient Boosting. These findings demonstrate the effectiveness of ensemble-based approaches for enhancing the reliability of data-driven rainfall forecasting in tropical urban environments and support their integration into early warning and hydrological risk management systems.