A Model-Agnostic Interpretability Approach to Predicting Customer Churn in the Telecommunications Industry

Teuku Rizky Noviandy; Ghalieb Mutig Idroes; Irsan Hardi; Mohd Afjal; Samrat Ray

doi:10.60084/ijds.v2i1.199

Authors

Teuku Rizky Noviandy Interdisciplinary Innovation Research Unit, Graha Primera Saintifika, Aceh Besar, 23771, Indonesia
Ghalieb Mutig Idroes Interdisciplinary Innovation Research Unit, Graha Primera Saintifika, Aceh Besar, 23771, Indonesia
Irsan Hardi Interdisciplinary Innovation Research Unit, Graha Primera Saintifika, Aceh Besar, 23771, Indonesia
Mohd Afjal VIT Business School, Vellore Institute of Technology, Vellore 632014, India
Samrat Ray Business Analytics, International Institute of Management Studies, Pune 411002, India

DOI:

https://doi.org/10.60084/ijds.v2i1.199

Keywords:

Machine learning, Model interpretability, Customer retention, SHAP analysis, Predictive analytics

Abstract

Customer churn is critical for businesses across various industries, especially in the telecommunications sector, where high churn rates can significantly impact revenue and growth. Understanding the factors leading to customer churn is essential for developing effective retention strategies. Despite the predictive power of machine learning models, there is a growing demand for model interpretability to ensure trust and transparency in decision-making processes. This study addresses this gap by applying advanced machine learning models, specifically Naïve Bayes, Random Forest, AdaBoost, XGBoost, and LightGBM, to predict customer churn in a telecommunications dataset. We enhanced model interpretability using SHapley Additive exPlanations (SHAP), which provides insights into feature contributions to predictions. Here, we show that LightGBM achieved the highest performance among the models, with an accuracy of 80.70%, precision of 84.35%, recall of 90.54%, and an F1-score of 87.34%. SHAP analysis revealed that features such as tenure, contract type, and monthly charges are significant predictors of customer churn. These results indicate that combining predictive analytics with interpretability methods can provide telecom companies with actionable insights to tailor retention strategies effectively. The study highlights the importance of understanding customer behavior through transparent and accurate models, paving the way for improved customer satisfaction and loyalty. Future research should focus on validating these findings with real-world data, exploring more sophisticated models, and incorporating temporal dynamics to enhance churn prediction models' predictive power and applicability.

Downloads

Download data is not yet available.

References

Zdravevski, E., Lameski, P., Apanowicz, C., and Ślȩzak, D. (2020). From Big Data to Business Analytics: The Case Study of Churn Prediction, Applied Soft Computing, Vol. 90, 106164. doi:10.1016/j.asoc.2020.106164.
Tianyuan, Z., and Moro, S. (2021). Research Trends in Customer Churn Prediction: A Data Mining Approach, 227–237. doi:10.1007/978-3-030-72657-7_22.
Lemmens, A., and Gupta, S. (2020). Managing Churn to Maximize profits, Marketing Science, Vol. 39, No. 5, 956–973.
De, S., and Prabu, P. (2022). Predicting Customer Churn: A Systematic Literature Review, Journal of Discrete Mathematical Sciences and Cryptography, Vol. 25, No. 7, 1965–1985. doi:10.1080/09720529.2022.2133238.
Idroes, G. M., Hardi, I., Hilal, I. S., Utami, R. T., Noviandy, T. R., and Idroes, R. (2024). Economic Growth and Environmental Impact: Assessing the Role of Geothermal Energy in Developing and Developed Countries, Innovation and Green Development, Vol. 3, No. 3, 100144. doi:10.1016/j.igd.2024.100144.
Idroes, G. M., Hardi, I., Rahman, M. H., Afjal, M., Noviandy, T. R., and Idroes, R. (2024). The Dynamic Impact of Non-renewable and Renewable Energy on Carbon Dioxide Emissions and Ecological Footprint in Indonesia, Carbon Research, Vol. 3, No. 1, 35. doi:10.1007/s44246-024-00117-0.
Óskarsdóttir, M., Bravo, C., Verbeke, W., Sarraute, C., Baesens, B., and Vanthienen, J. (2017). Social Network Analytics for Churn Prediction in Telco: Model Building, Evaluation and Network Architecture, Expert Systems with Applications, Vol. 85, 204–220. doi:10.1016/j.eswa.2017.05.028.
Santouridis, I., and Trivellas, P. (2010). Investigating the Impact of Service Quality and Customer Satisfaction on Customer Loyalty in Mobile Telephony in Greece, The TQM Journal, Vol. 22, No. 3, 330–343. doi:10.1108/17542731011035550.
Noviandy, T. R., Maulana, A., Idroes, G. M., Maulydia, N. B., Patwekar, M., Suhendra, R., and Idroes, R. (2023). Integrating Genetic Algorithm and LightGBM for QSAR Modeling of Acetylcholinesterase Inhibitors in Alzheimer’s Disease Drug Discovery, Malacca Pharmaceutics, Vol. 1, No. 2, 48–54. doi:10.60084/mp.v1i2.60.
Sasmita, N. R., Ramadeska, S., Kesuma, Z. M., Noviandy, T. R., Maulana, A., Khairul, M., and Suhendra, R. (2024). Decision Tree versus k-NN: A Performance Comparison for Air Quality Classification in Indonesia, Infolitika Journal of Data Science, Vol. 2, No. 1, 9–16. doi:10.60084/ijds.v2i1.179.
Noviandy, T. R., Nisa, K., Idroes, G. M., Hardi, I., and Sasmita, N. R. (2024). Classifying Beta-Secretase 1 Inhibitor Activity for Alzheimer’s Drug Discovery with LightGBM, Journal of Computing Theories and Applications, Vol. 2, No. 2, 138–147. doi:10.62411/jcta.10129.
Matuszelański, K., and Kopczewska, K. (2022). Customer Churn in Retail E-Commerce Business: Spatial and Machine Learning Approach, Journal of Theoretical and Applied Electronic Commerce Research, Vol. 17, No. 1, 165–198. doi:10.3390/jtaer17010009.
Bhuse, P., Gandhi, A., Meswani, P., Muni, R., and Katre, N. (2020). Machine Learning Based Telecom-Customer Churn Prediction, 2020 3rd International Conference on Intelligent Sustainable Systems (ICISS), IEEE, 1297–1301. doi:10.1109/ICISS49785.2020.9315951.
Zhang, T., Moro, S., and Ramos, R. F. (2022). A Data-Driven Approach to Improve Customer Churn Prediction Based on Telecom Customer Segmentation, Future Internet, Vol. 14, No. 3, 94. doi:10.3390/fi14030094.
Liu, Y., Fan, J., Zhang, J., Yin, X., and Song, Z. (2023). Research on Telecom Customer Churn Prediction Based on Ensemble Learning, Journal of Intelligent Information Systems, Vol. 60, No. 3, 759–775. doi:10.1007/s10844-022-00739-z.
Noviandy, T. R., Idroes, G. M., and Hardi, I. (2024). Enhancing Loan Approval Decision-Making: An Interpretable Machine Learning Approach Using LightGBM for Digital Economy Development, Malaysian Journal of Computing (MJOC), Vol. 9, No. 1, 1734–1745. doi:10.24191/mjoc.v9i1.25691.
Belle, V., and Papantonis, I. (2021). Principles and Practice of Explainable Machine Learning, Frontiers in Big Data, Vol. 4. doi:10.3389/fdata.2021.688969.
Noviandy, T. R., Maulana, A., Zulfikar, T., Rusyana, A., Enitan, S. S., and Idroes, R. (2024). Explainable Artificial Intelligence in Medical Imaging: A Case Study on Enhancing Lung Cancer Detection through CT Images, Indonesian Journal of Case Reports, Vol. 2, No. 1, 6–14. doi:10.60084/ijcr.v2i1.150.
Lundberg, S. M., and Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions, Advances in Neural Information Processing Systems, Vol. 30.
Le, T.-T.-H., Kim, H., Kang, H., and Kim, H. (2022). Classification and Explanation for Intrusion Detection System Based on Ensemble Trees and SHAP Method, Sensors, Vol. 22, No. 3, 1154. doi:10.3390/s22031154.
Barr Kumarakulasinghe, N., Blomberg, T., Liu, J., Saraiva Leao, A., and Papapetrou, P. (2020). Evaluating Local Interpretable Model-Agnostic Explanations on Clinical Machine Learning Classification Models, 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), IEEE, 7–12. doi:10.1109/CBMS49503.2020.00009.
Moscato, V., Picariello, A., and Sperlí, G. (2021). A Benchmark of Machine Learning Approaches for Credit Score Prediction, Expert Systems with Applications, Vol. 165, 113986. doi:10.1016/j.eswa.2020.113986.
IBM Team. (2024). Telco Customer Churn (11.1.3+), Kaggle. doi:10.34740/KAGGLE/DSV/8360350.
Idroes, G. M., Noviandy, T. R., Maulana, A., Zahriah, Z., Suhendrayatna, S., Suhartono, E., Khairan, K., Kusumo, F., Helwani, Z., and Abd Rahman, S. (2023). Urban Air Quality Classification Using Machine Learning Approach to Enhance Environmental Monitoring, Leuser Journal of Environmental Studies, Vol. 1, No. 2, 62–68. doi:10.60084/ljes.v1i2.99.
Wickramasinghe, I., and Kalutarage, H. (2021). Naive Bayes: Applications, Variations and Vulnerabilities: A Review of Literature with Code Snippets for Implementation, Soft Computing, Vol. 25, No. 3, 2277–2293. doi:10.1007/s00500-020-05297-6.
Boulesteix, A., Janitza, S., Kruppa, J., and König, I. R. (2012). Overview of Random Forest Methodology and Practical Guidance with Emphasis on Computational Biology and Bioinformatics, WIREs Data Mining and Knowledge Discovery, Vol. 2, No. 6, 493–507. doi:10.1002/widm.1072.
Chuttur, M. Y., and Bissonath, R. (2022). A Comparison of AdaBoost and SVC for Fake Hotel Reviews Detection, 2022 3rd International Conference on Computation, Automation and Knowledge Management (ICCAKM), IEEE, 1–6. doi:10.1109/ICCAKM54721.2022.9990075.
Wang, K., Li, M., Cheng, J., Zhou, X., and Li, G. (2022). Research on Personal Credit Risk Evaluation Based on XGBoost, Procedia Computer Science, Vol. 199, 1128–1135. doi:10.1016/j.procs.2022.01.143.
Noviandy, T. R., Maulana, A., Emran, T. B., Idroes, G. M., and Idroes, R. (2023). QSAR Classification of Beta-Secretase 1 Inhibitor Activity in Alzheimer’s Disease Using Ensemble Machine Learning Algorithms, Heca Journal of Applied Sciences, Vol. 1, No. 1, 1–7. doi:10.60084/hjas.v1i1.12.
Noviandy, T. R., Nainggolan, S. I., Raihan, R., Firmansyah, I., and Idroes, R. (2023). Maternal Health Risk Detection Using Light Gradient Boosting Machine Approach, Infolitika Journal of Data Science, Vol. 1, No. 2, 48–55. doi:10.60084/ijds.v1i2.123.
Sevgen, E., and Abdikan, S. (2023). Classification of Large-Scale Mobile Laser Scanning Data in Urban Area with LightGBM, Remote Sensing, Vol. 15, No. 15, 3787. doi:10.3390/rs15153787.
Noviandy, T. R., Zahriah, Z., Yandri, E., Jalil, Z., Yusuf, M., Yusof, N. I. S. M., Lala, A., and Idroes, R. (2024). Machine Learning for Early Detection of Dropout Risks and Academic Excellence: A Stacked Classifier Approach, Journal of Educational Management and Learning, Vol. 2, No. 1, 28–34. doi:10.60084/jeml.v2i1.191.
Suhendra, R., Suryadi, S., Husdayanti, N., Maulana, A., Noviandy, T. R., Sasmita, N. R., Subianto, M., Earlia, N., Niode, N. J., and Idroes, R. (2023). Evaluation of Gradient Boosted Classifier in Atopic Dermatitis Severity Score Classification, Heca Journal of Applied Sciences, Vol. 1, No. 2, 54–61. doi:10.60084/hjas.v1i2.85.
Noviandy, T. R., Maulana, A., Idroes, G. M., Irvanizam, I., Subianto, M., and Idroes, R. (2023). QSAR-Based Stacked Ensemble Classifier for Hepatitis C NS5B Inhibitor Prediction, 2023 2nd International Conference on Computer System, Information Technology, and Electrical Engineering (COSITE), IEEE, 220–225. doi:10.1109/COSITE60233.2023.10250039.
Klingspohn, W., Mathea, M., ter Laak, A., Heinrich, N., and Baumann, K. (2017). Efficiency of different measures for defining the applicability domain of classification models, Journal of Cheminformatics, Vol. 9, No. 1, 44. doi:10.1186/s13321-017-0230-2.
Berrar, D., and Flach, P. (2012). Caveats and Pitfalls of ROC Analysis in Clinical Microarray Research (and How to Avoid Them), Briefings in Bioinformatics, Vol. 13, No. 1, 83–97. doi:10.1093/bib/bbr008.