ENHANCING EMPLOYEE ATTRITION PREDICTION: THE IMPACT OF DATA PREPROCESSING ON MACHINE LEARNING MODEL PERFORMANCE
Abstract
Organizations face a serious problem with employee attrition, which raises expenses and reduces productivity. This study looks at how preprocessing data can help machine learning models forecast employee turnover more accurately. Seven machine learning algorithms—Random Forest, k-Nearest Neighbors (k-NN), XGBoost, Gradient Boosting, Linear Discriminant Analysis (LDA), LightGBM, and Logistic Regression—were used to analyze the 1,470 records in the International Business Machines Human Resources (IBM HR). Employee Attrition dataset. SimpleImputer was used to handle missing values, StandardScaler was used to standardize numerical features, and SelectFromModel was used to choose important features. These actions were essential in improving the accuracy of the model; LDA had the highest accuracy of 87.38%, followed by LightGBM and Logistic Regression, both of which had 87% accuracy. All models' performance metrics were much enhanced by preprocessing; k-NN had the lowest accuracy, at 85.33%. These results demonstrate how important preprocessing is to predictive analytics and how HR management may use it to identify at-risk workers and create successful retention plans.
References
Ahmad, S., Iliyasu, U., & Jamilu, B. A. (2023). ENHANCED PREDICTIVE MODEL FOR SCHISTOSOMIASIS. FUDMA JOURNAL OF SCIENCES, 7(3), 288292. https://doi.org/10.33003/fjs-2023-0703-801 DOI: https://doi.org/10.33003/fjs-2023-0703-801
Alsheref, F. K., Fattoh, I. E., & Mead, W. (2022). Automated Prediction of Employee Attrition Using Ensemble Model Based on Machine Learning Algorithms. Computational Intelligence and Neuroscience, 2022. https://doi.org/10.1155/2022/7728668 DOI: https://doi.org/10.1155/2022/7728668
Davidson, R. G., & Brindha, Dr. (2021, June 8). Inspecting the Impact of Various Factors Influencing Employee Attrition in Hotel Industry. https://doi.org/10.4108/eai.7-6-2021.2308612 DOI: https://doi.org/10.4108/eai.7-6-2021.2308612
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (n.d.). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. https://github.com/Microsoft/LightGBM.
Najafi-Zangeneh, S., Shams-Gharneh, N., Arjomandi-Nezhad, A., & Zolfani, S. H. (2021). An improved machine learning-based employees attrition prediction framework with emphasis on feature selection. Mathematics, 9(11). https://doi.org/10.3390/math9111226 DOI: https://doi.org/10.3390/math9111226
Otchere, D. A., Ganat, T. O. A., Ojero, J. O., Tackie-Otoo, B. N., & Taki, M. Y. (2022). Application of gradient boosting regression model for the evaluation of feature selection techniques in improving reservoir characterisation predictions. Journal of Petroleum Science and Engineering, 208. https://doi.org/10.1016/j.petrol.2021.109244 DOI: https://doi.org/10.1016/j.petrol.2021.109244
Pratt, M., Boudhane, M., & Cakula, S. (2021). Employee attrition estimation using random forest algorithm. Baltic Journal of Modern Computing, 9(1), 4966. https://doi.org/10.22364/BJMC.2021.9.1.04 DOI: https://doi.org/10.22364/bjmc.2021.9.1.04
Ponnuru, S. R. (2020). Employee Attrition Prediction using Logistic Regression. International Journal for Research in Applied Science and Engineering Technology, 8(5), 28712875. https://doi.org/10.22214/ijraset.2020.5481 DOI: https://doi.org/10.22214/ijraset.2020.5481
Punnoose, R., & Xlri -Xavier, C. (2016). Prediction of Employee Turnover in Organizations using Machine Learning Algorithms A case for Extreme Gradient Boosting. In IJARAI) International Journal of Advanced Research in Artificial Intelligence (Vol. 5, Issue 9). www.ijarai.thesai.org DOI: https://doi.org/10.14569/IJARAI.2016.050904
Rahamneh, A. A. A. L., Jresat, S. S., Zubaidi, F., & Al-Hawary, S. I. S. (2023). Using the Linear Discriminant Analysis Method to Classify Types of Bowels and Esophageal cancer in Jordan. Information Sciences Letters, 12(3), 12991305. https://doi.org/10.18576/isl/120320 DOI: https://doi.org/10.18576/isl/120320
Raza, A., Munir, K., Almutairi, M., Younas, F., & Fareed, M. M. S. (2022). Predicting Employee Attrition Using Machine Learning Approaches. Applied Sciences (Switzerland), 12(13). https://doi.org/10.3390/app12136424 DOI: https://doi.org/10.3390/app12136424
Sethy, A., & Kumar Raut, A. (n.d.). EMPLOYEE ATTRITION RATE PREDICTION USING MACHINE LEARNING APPROACH. Turkish Journal of Physiotherapy and Rehabilitation, 32(3). www.turkjphysiotherrehabil.org
Usha, P. M., & Balaji, N. V. (2021). A comparative study on machine learning algorithms for employee attrition prediction. IOP Conference Series: Materials Science and Engineering, 1085(1), 012029. https://doi.org/10.1088/1757-899x/1085/1/012029 DOI: https://doi.org/10.1088/1757-899X/1085/1/012029
Xu, J., Zhang, Y., & Miao, D. (2020). Three-way confusion matrix for classification: A measure driven view. Information Sciences, 507, 772794. https://doi.org/10.1016/j.ins.2019.06.064 DOI: https://doi.org/10.1016/j.ins.2019.06.064
Wardhani, F. H., & Lhaksmana, K. M. (2022). Predicting Employee Attrition Using Logistic Regression with Feature Selection. Sinkron, 7(4), 22142222. https://doi.org/10.33395/sinkron.v7i4.11783 DOI: https://doi.org/10.33395/sinkron.v7i4.11783
Copyright (c) 2025 FUDMA JOURNAL OF SCIENCES
![Creative Commons License](http://i.creativecommons.org/l/by/4.0/88x31.png)
This work is licensed under a Creative Commons Attribution 4.0 International License.
FUDMA Journal of Sciences