Implementation of Smote, Random Oversampling and Random Undersampling in Random Forest and Lasso Logistic Regression for Customer Churn Prediction
Abstract
Predicting customer churn remains a critical challenge in the banking industry, as retaining existing customers is generally more cost-effective than acquiring new ones. This study addresses the issue of data imbalance in churn prediction by applying resampling techniques, namely Synthetic Minority Oversampling Technique (SMOTE), Random Over-Sampling (ROS), and Random Under-Sampling (RUS). The dataset comprises 10,000 customer records with 14 attributes, analyzed using Random Forest and LASSO Logistic Regression algorithms. Model performance was evaluated using accuracy, precision, recall, and F1-score metrics. The results indicate that Random Forest combined with ROS achieved the highest accuracy (86%), although the churn recall remained low (0.50). SMOTE yielded an accuracy of 82% with a more balanced recall (0.62), while RUS achieved an accuracy of 79% and the highest recall (0.78), albeit at the expense of precision. For LASSO Logistic Regression, SMOTE provided the best results with 73% accuracy and 0.64 recall, whereas both ROS and RUS achieved 71% accuracy with a recall of 0.72. The findings highlight the effectiveness of oversampling techniques in enhancing churn detection, providing practical insights for banking institutions to improve customer retention strategies.
Copyright (c) 2025 Mortalita: Journal of Mathematics and Its Applications

This work is licensed under a Creative Commons Attribution 4.0 International License.
11.gif)










