نوع مقاله : مقاله علمی پژوهشی
نویسندگان
1 استاد، گروه مدیریت صنعتی، دانشکده مدیریت، دانشگاه تهران، تهران، ایران.
2 دانشجوی دکتری، گروه مدیریت صنعتی، دانشکده مدیریت، دانشگاه تهران، تهران، ایران.
3 دانشجو دکتری، گروه مدیریت صنعتی، پردیس بین المللی کیش، دانشگاه تهران، تهران، ایران.
چکیده
کلیدواژهها
موضوعات
عنوان مقاله [English]
نویسندگان [English]
Objective: Accurate prediction of customer value in the banking industry is one of the fundamental challenges that can contribute to optimal decision-making in customer management and resource allocation. This study aims to develop a comprehensive approach for predicting the value of banking customers. The primary focus of this research is on addressing the challenge of imbalanced data, improving the performance of machine learning models, and selecting key features that are effective in predicting customer value for real-world applications in banking environments.
Methodology: In this paper, the data of one of the banks involving 2000 customers and 14 features are correlated to the transaction and customers’ activity. The requirements of data preprocessing were done, followed by the selection of the features as well as data imbalance and applying ADASYN technique. The analysis of the correlation between the variables and the Feature Importance method according to the results of the Random Forest algorithm was also used to complete the feature selection. In this process of the algorithm, features with high correlation have been obtained and the final usual features have been selected. After that, the 11 machine learning algorithms such as CatBoost, XGBoost, Random Forest, LightGBM, and linear and nonlinear models were used to predict the customer value. For the better performance of the presented models, the Optuna was adopted for hyper-parameter tuning while the cross-validation analysis was applied into five fold for precise model estimation. Among the four tests that were used to evaluate the performance of the models, accuracy, precision, recall, and F1 score tests were used.
Findings: The results showed that ensemble learning-based algorithms provided the best performance in predicting customer value. The CatBoost model, with an F1 Score of 0.9324 and an accuracy of 0.909, was identified as the best-performing model. This model achieved a proper balance between precision and recall, with a precision of 0.9677 and a recall of 0.8998 in predicting valuable customers. The XGBoost and Random Forest models also demonstrated similar performance to CatBoost, with F1 Scores of 0.9322 and 0.932, respectively. The use of a combined approach for feature selection and the application of the ADASYN method for data balancing played a significant role in improving the performance of these models.
Conclusion: These results show that a different approach to data preprocessing with the help of the ADASYN algorithm in combination with modern machine learning methods can positively affect the effectiveness of models predicting customer value. The correlated variables selection and the feature importance based on the Random Forest was important in improving the general performance of the models. This revolution allowed strengthening the work of models through the elimination of features and information that had less impact in the final decisions, making the latter more precise. Based on the results of its evaluation, it can be concluded that ensemble learning models, therefore CatBoost, XGBoost, and Random Forest, are the most appropriate for banking settings because of its efficiency and effectiveness in dealing with large-scale, complex, and imbalanced datasets. Thus, the current paper has oriented itself on extending the previous research studies, addressing the issues of imbalanced data and feature selection to enhance the customer management in the banking sector, which contributed to the development of an efficient approach to the challenge. The results are useful for the definition of the criteria for the identification of the banks’ high value costumer base and the formulation of improved policies regarding their retention and servicing.
کلیدواژهها [English]