Implementasi Feature Selection Menggunakan Boruta untuk Peningkatan Akurasi Model Lapser Prediction: Implementation of Feature Selection Using Boruta to Improve the Accuracy of the Lapser Prediction Model

Mochamad Gilang Saputra; Bagus Jati Santoso

doi:10.57152/malcom.v5i3.1992

Authors

Mochamad Gilang Saputra Institut Teknologi Sepuluh Nopember (ITS)
Bagus Jati Santoso Institut Teknologi Sepuluh Nopember (ITS)

DOI:

https://doi.org/10.57152/malcom.v5i3.1992

Keywords:

Boruta, Feature Selection, Gradient Boosting, Lapser, Machine Learning

Abstract

Memprediksi pelanggan lapser menjadi tantangan utama di sektor layanan data yang kompetitif, disertai tingginya biaya akuisisi pelanggan baru. Penelitian ini mengusulkan pendekatan feature selection menggunakan Boruta untuk meningkatkan akurasi model lapser, dengan menerapkan teknik wrapper pada Random Forest. Proses modeling lapser prediction menggunakan algoritma machine learning Gradient Boosting yang dianalisis sebelum dan sesudah seleksi fitur Boruta. Hasil eksperimen pada data menunjukkan bahwa Boruta efektif dalam meningkatkan metrik utama (akurasi, recall, dan AUC). Model Gradient Boosting meraih akurasi hingga 75.10%, recall 74.42%, dan AUC 82.18% setelah menggunakan Boruta. Sebelum menggunakan Boruta nilai akurasi 71.74%, recall 68.74%, dan AUC hanya 77.77%. Temuan tersebut menegaskan bahwa pendekatan yang diusulkan dapat memprediksi lapser secara lebih dini, serta membantu penyusun kebijakan menyusun strategi retensi pelanggan yang lebih efektif, sehingga meminimalkan potensi kerugian dan memperkuat daya saing di pasar.

Downloads

Download data is not yet available.

References

B. Larivière and D. Van den Poel, “Investigating the role of product features in preventing customer churn, by using survival analysis and choice modeling: The case of financial services,” Expert Syst Appl, vol. 27, no. 2, pp. 277–285, 2004, doi: https://doi.org/10.1016/j.eswa.2004.02.002.

Y. Duan, R. Zbigniew W, A. Lu, A. Tzacheva, and M. Khouja, “Recommender System for Improving Churn Rate,” 2022.

S. Neslin, S. Gupta, W. Kamakura, J. Lu, and C. Mason, “Defection Detection: Measuring and Understanding the Predictive Accuracy of Customer Churn Models,” Journal of Marketing Research American Marketing Association ISSN, vol. 43, pp. 204–211, Apr. 2006, doi: 10.1509/jmkr.43.2.204.

R. Agrawal, “A Modified K-Nearest Neighbor Algorithm Using Feature Optimization,” International Journal of Engineering and Technology, vol. 8, pp. 28–37, Feb. 2016.

C. Chen and C.-Y. Zhang, “Data-intensive applications, challenges, techniques and technologies: A survey on Big Data,” Inf Sci (N Y), vol. 275, pp. 314–347, Aug. 2014, doi: 10.1016/j.ins.2014.01.015.

H. He and E. A. Garcia, “Learning from Imbalanced Data,” Knowledge and Data Engineering, IEEE Transactions on, vol. 21, pp. 1263–1284, Oct. 2009, doi: 10.1109/TKDE.2008.239.

M. B. Kursa and W. R. Rudnicki, “Feature selection with the boruta package,” J Stat Softw, vol. 36, no. 11, pp. 1–13, 2010, doi: 10.18637/jss.v036.i11.

J. Friedman, “Greedy Function Approximation: A Gradient Boosting Machine,” The Annals of Statistics, vol. 29, Nov. 2000, doi: 10.1214/aos/1013203451.

A. Liaw and M. Wiener, “Classification and Regression by RandomForest,” Forest, vol. 23, Nov. 2001.

M. P. Parmar and M. Shilpa Serasiya, “Telecom Churn Prediction Model using XgBoost Classifier and Logistic Regression Algorithm,” International Research Journal of Engineering and Technology, 2021, [Online]. Available: www.irjet.net

T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics). 2009.

H. Ouchra, A. Belangour, and A. Erraissi, “Machine Learning Algorithms for Satellite Image Classification Using Google Earth Engine and Landsat Satellite Data: Morocco Case Study,” IEEE Access, p. 1, Jul. 2023, doi: 10.1109/ACCESS.2023.3293828.

S. Wu, “Customer Churn Prediction in Telecom Based on Machine Learning,” Highlights in Science, Engineering and Technology, vol. 94, pp. 113–118, Apr. 2024, doi: 10.54097/snc09915.

B. Zhang, “Customer Churn in Subscription Business Model—Predictive Analytics on Customer Churn,” BCP Business & Management, vol. 44, pp. 870–876, Apr. 2023, doi: 10.54691/bcpbm.v44i.4971.

Y.-J. Han, J. Moon, and J. Woo, “Prediction of Churning Game Users Based on Social Activity and Churn Graph Neural Networks,” IEEE Access, vol. PP, p. 1, Jan. 2024, doi: 10.1109/ACCESS.2024.3429559.

F. Sohil, M. Sohail, and J. Shabbir, “An introduction to statistical learning with applications in R: by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani, New York, Springer Science and Business Media, 2013, $41.98, eISBN: 978-1-4614-7137-7,” Stat Theory Relat Fields, vol. 6, p. 1, Sep. 2021, doi: 10.1080/24754269.2021.1980261.

N. Farhana, A. Firdaus, M. F. Darmawan, and M. F. Ab Razak, “Evaluation of Boruta algorithm in DDoS detection,” Egyptian Informatics Journal, vol. 24, no. 1, pp. 27–42, Mar. 2023, doi: 10.1016/j.eij.2022.10.005.

A. Alsahaf, N. Petkov, V. Shenoy, and G. Azzopardi, “A framework for feature selection through boosting,” Expert Syst Appl, vol. 187, p. 115895, Jan. 2022, doi: 10.1016/J.ESWA.2021.115895.

F. Gorunescu, Data Mining: Concepts, models and techniques. 2011.

A. Jovic, K. Brki?, and N. Bogunovic, A review of feature selection methods with applications. 2015. doi: 10.1109/MIPRO.2015.7160458.

A. Bhatnagar, “Customer Churn Prediction using Machine Learning Approach: A Comprehensive Study,” Journal of Information Systems Engineering and Management, vol. 10, pp. 80–92, Mar. 2025, doi: 10.52783/jisem.v10i25s.3944.

M. Ganiyu, O. E. Johnson, and O. V Johnson, “Credit Scoring Prediction Using Boruta Feature Selection with Different Sampling Techniques,” in 2024 International Conference on Science, Engineering and Business for Driving Sustainable Development Goals (SEB4SDG), 2024, pp. 1–9. doi: 10.1109/SEB4SDG60871.2024.10630264.

A. M. Sharifnia, D. E. Kpormegbey, D. K. Thapa, and M. Cleary, “A Primer of Data Cleaning in Quantitative Research: Handling Missing Values and Outliers,” J Adv Nurs, 2025, doi: 10.1111/jan.16908.

F. Degenhardt, S. Seifert, and S. Szymczak, “Evaluation of variable selection methods for random forests and omics data sets,” Brief Bioinform, vol. 20, no. 2, pp. 492–503, Mar. 2019, doi: 10.1093/bib/bbx124.

H. Gholami, A. Mohammadifar, S. Golzari, D. G. Kaskaoutis, and A. L. Collins, “Using the Boruta algorithm and deep learning models for mapping land susceptibility to atmospheric dust emissions in Iran,” Aeolian Res, vol. 50, Mar. 2021, doi: 10.1016/j.aeolia.2021.100682.

Z. Zhang, L. Shi, and D.-X. Zhou, “Classification with Deep Neural Networks and Logistic Loss,” Journal of Machine Learning Research, vol. 25, no. 125, pp. 1–117, 2024, [Online]. Available: http://jmlr.org/papers/v25/22-0049.html

O. Rainio, J. Teuho, and R. Klén, “Evaluation metrics and statistical tests for machine learning,” Sci Rep, vol. 14, no. 1, p. 6086, 2024, doi: 10.1038/s41598-024-56706-x.

N. Chawla, K. Bowyer, L. Hall, and W. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” J. Artif. Intell. Res. (JAIR), vol. 16, pp. 321–357, Jun. 2002, doi: 10.1613/jair.953.