Ensemble Variable Importance: Combining Random Forest, Neural Network, and Support Vector Machine via Genetic Algorithm (Case Study: Student Productivity)

Asep Rusyana; Marzuki Marzuki; Siti Rusdiana; Fitriana AR; Nurhasanah Nurhasanah; Nany Salwa; Mahmudi Mahmudi

doi:10.60084/ijds.v4i1.424

Authors

Asep Rusyana Department of Statistics, Universitas Syiah Kuala, Banda Aceh 23111, Indonesia
Marzuki Marzuki Department of Statistics, Universitas Syiah Kuala, Banda Aceh 23111, Indonesia; Department of Mathematics, Universiti Malaysia Terengganu, Kuala Nerus, Malaysia
Siti Rusdiana Department of Mathematics, Universitas Syiah Kuala, Banda Aceh 23111, Indonesia
Fitriana AR Department of Statistics, Universitas Syiah Kuala, Banda Aceh 23111, Indonesia
Nurhasanah Nurhasanah Department of Statistics, Universitas Syiah Kuala, Banda Aceh 23111, Indonesia
Nany Salwa Department of Statistics, Universitas Syiah Kuala, Banda Aceh 23111, Indonesia
Mahmudi Mahmudi Department of Mathematics, Universitas Syiah Kuala, Banda Aceh 23111, Indonesia

DOI:

https://doi.org/10.60084/ijds.v4i1.424

Keywords:

Ensemble variable importance, Genetic algorithm, Random forest, Neural network, Student productivity

Abstract

This study proposes and evaluates an ensemble variable‑importance framework that integrates permutation‑based importance scores from three distinct supervised learning algorithms: Random Forest, Neural Network, and Support Vector Machine, using a genetic‑algorithm optimizer. The approach addresses the well‑known problem that algorithm‑specific importance diagnostics can yield divergent feature rankings, complicating substantive interpretation and downstream decision‑making. Using a large publicly available student‑productivity dataset (N = 20,000), predictors describing study behavior, digital‑media use, lifestyle, and academic indicators were normalized with Min–Max scaling, and permutation variable importance (PVI) was estimated repeatedly within each model to obtain stable mean PVI values and standard errors. A genetic algorithm was then employed to search the space of ensemble weightings (rank‑aggregation solutions) that maximize a chosen fitness criterion—Spearman rank concordance with out‑of‑sample predictive relevance—thereby producing a consensus ranking of predictors. Empirical results indicate rapid GA convergence (fitness ≈ 0.82 within 20–30 generations) and strong cross‑model agreement for a small core of predictors: study hours (X3) and focus score (X15) consistently emerged as the most salient features across individual models and in the ensemble ranking. A secondary set of variables (e.g., sleep hours, phone usage, attendance, and stress level) displayed moderate importance, while several features exhibited model‑dependent variability in ranks. The ensemble procedure thereby yields stable, model‑agnostic importance estimates that enhance interpretability and reduce dependence on any single algorithm’s idiosyncrasies. We discuss implications for educational analytics and recommend external validation, targeted feature engineering, and sensitivity analyses (alternate scalings and GA settings) to assess robustness and to support reliable, actionable inferences from machine‑learning models in applied settings.

Downloads

Download data is not yet available.

References

Breiman, L. (2001). Random Forests, Machine Learning, Vol. 45, No. 1, 5–32. doi:10.1023/A:1010933404324.
Hastie, T., Tibshirani, R., and Firiedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd Ed.), Springer.3. Dwork, C., Kumar, R., Naor, M., and Sivakumar, D. (2001). Rank Aggregation Methods for the Web, Proceedings of the 10th International Conference on World Wide Web (WWW ’01), 613–622.
Kolde, R., Laur, S., Adler, P., and Vilo, J. (2012). Robust Rank Aggregation for Gene List Integration and Meta-Analysis, Bioinformatics, Vol. 28, No. 4, 573–580.
Lundberg, S. M., and Lee, S. (2017). A Unified Approach to Interpreting Model Predictions, 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, 1–10.
El Furqany, N., Subianto, M., and Rusyana, A. (2025). Hybrid Ensemble Learning with SMOTEENN and Soft Voting for Stunting Risk Prediction: A SHAP-Based Explainable Approach, Journal of Applied Data Sciences, Vol. 6, No. 4, 2989–3004. doi:10.47738/jads.v6i4.829.
El Furqany, N., Subianto, M., Rusyana, A., Zahnur, and Ramadhani, E. (2025). Hybrid Soft-Voting Ensemble Model With Smoteenn: An Efficient Learning Approach for Stunting Risk Prediction, 2025 International Conference on Information Technology Research and Innovation (ICITRI), IEEE, Jakarta.
Fisher, A., Rudin, C., and Dominici, F. (2019). All Models Are Wrong, but Many Are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously, Journal of Machine Learning Research, Vol. 20, No. 177, 1–81.
Rusyana, A. (2024, June 13). Pengembangan Ensemble Variable Importance untuk Beberapa Model Machine Learning Menggunakan Algoritma Metaheuristik (Disertasi)IPB University, Bogor.
Rusyana, A., Wigena, A. H., Sumertajaya, I. M., and Sartono, B. (2024). Unifying Variable Importance Scores from Different Machine Learning Models Using Simulated Annealing, Ingenierie Des Systemes d’Information, Vol. 29, No. 2, 649–657. doi:10.18280/isi.290226.
Rusyana, A., Wigena, A. H., Sumertajaya, I. M., and Sartono, B. (2024). An Optimal Variable Importance for Machine Learning Classification Models Using Modified Simulated Annealing Algorithm, IOP Conference Series: Earth and Environmental Science (Vol. 1356), Institute of Physics. doi:10.1088/1755-1315/1356/1/012089.
Rusyana, A., Wigena, A. H., Sumertajaya, I. M., and Sartono, B. (2023). An Optimal Approach to Identify the Importance of Variables in Machine Learning Using Cuckoo Search Algorithm, Mathematics and Statistics, Vol. 11, No. 6, 895–909. doi:10.13189/ms.2023.110604.
Kaggle. (2026, May 1). Student Productivity Dataset, Public Repository, from https://www.kaggle.com/datasets/adilshamim8/student-performance-and-learning-style, accessed 1-5-2026.
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning, (T. Dietterich, Ed.), MIT Press, Massachusetts.
Aulia, R., Sofyan, H., and Rusyana, A. (2026). Performance Comparison of Machine Learning Algorithms for Stunting Detection with Recursive Feature Elimination and SMOTE, 2026 International Conference on Advances in Artificial Intelligence and Machine Learning (AAIML), IEEE, Tokyo, 676–680.
Maulana, A., Idroes, G. M., Kemala, P., Maulydia, N. B., Sasmita, N. R., Tallei, T. E., Sofyan, H., and Rusyana, A. (2023). Leveraging Artificial Intelligence to Predict Student Performance: A Comparative Machine Learning Approach, Journal of Educational Management and Learning, Vol. 1, No. 2, 64–70. doi:10.60084/jeml.v1i2.132.
Noviandy, T. R., Maulana, A., Idroes, G. M., Suhendra, R., Adam, M., Rusyana, A., and Sofyan, H. (2023). Deep Learning-Based Bitcoin Price Forecasting Using Neural Prophet, Ekonomikalia Journal of Economics, Vol. 1, No. 1, 19–25. doi:10.60084/eje.v1i1.51.
Gelon, A. (2017). Hand on Machine Learning with ScikitLearn and TensorFlow , O’Reilly, Baijing, Boston, arnham, Sebastopol, Tokyo.
Gendreau, M., and Potvin, J.-Y. (2010). Handbook of Metaheuristics, Springer, New York. doi:10.1007/978-1-4419-1665-5.
Sukandar, D., Rusyana, A., Yusrina, F. I., and Mutiara, P. T. (2024). Metode Statistika Dengan Perangkat Lunak Excel Dan Statistika Dalam Bidang Gizi, Pangan, Kedokteran, Kesehatan, Farmasi, Pertanian, Sosial, Ekonomi, Dan Lain-Lain, CV. Luminary Press Indonesia, Padang.
Molnar, C. (2020). Interpretable Machine Learning: A Guide for Making Black Box Models Explainable.
Wei, P., Lu, Z., and Song, J. (2015). Variable Importance Analysis: A Comprehensive Review, Reliability Engineering and System Safety, Vol. 142, 399 – 432. doi:10.1016/j.ress.2015.05.018.
Sukandar, D., and Rusyana, A. (2023). Regresi Dan Korelasi Dengan Aplikasi SAS, SPSS, Dan Minitab Dalam Bidang Gizi, Pangan, Kesehatan, Pertanian, Dan Lain-Lain, IPB Press, Bogor.