Implementation of Smote, Random Oversampling and Random Undersampling in Random Forest and Lasso Logistic Regression for Customer Churn Prediction

  • Muhamad Hilman Rizaldi Universitas Brawijaya
Keywords: Customer Churn, LASSO Logistic Regression, Machine Learning, Random Forest, Resampling Technique

Abstract

Predicting customer churn remains a critical challenge in the banking industry, as retaining existing customers is generally more cost-effective than acquiring new ones. This study addresses the issue of data imbalance in churn prediction by applying resampling techniques, namely Synthetic Minority Oversampling Technique (SMOTE), Random Over-Sampling (ROS), and Random Under-Sampling (RUS). The dataset comprises 10,000 customer records with 14 attributes, analyzed using Random Forest and LASSO Logistic Regression algorithms. Model performance was evaluated using accuracy, precision, recall, and F1-score metrics. The results indicate that Random Forest combined with ROS achieved the highest accuracy (86%), although the churn recall remained low (0.50). SMOTE yielded an accuracy of 82% with a more balanced recall (0.62), while RUS achieved an accuracy of 79% and the highest recall (0.78), albeit at the expense of precision. For LASSO Logistic Regression, SMOTE provided the best results with 73% accuracy and 0.64 recall, whereas both ROS and RUS achieved 71% accuracy with a recall of 0.72. The findings highlight the effectiveness of oversampling techniques in enhancing churn detection, providing practical insights for banking institutions to improve customer retention strategies.

Published
2025-06-09
How to Cite
Muhamad Hilman Rizaldi. (2025). Implementation of Smote, Random Oversampling and Random Undersampling in Random Forest and Lasso Logistic Regression for Customer Churn Prediction. Mortalita: Journal of Mathematics and Its Applications, 2(1), 1-14. https://doi.org/10.61159/mortalita.v2i1.709
Section
Articles