ENHANCING EMPLOYEE ATTRITION PREDICTION: THE IMPACT OF DATA PREPROCESSING ON MACHINE LEARNING MODEL PERFORMANCE
DOI:
https://doi.org/10.33003/fjs-2025-0901-3030Keywords:
Employee Attrition, Machine Learning, Predictive Analytics, Data Preprocessing, HR ManagementAbstract
Organizations face a serious problem with employee attrition, which raises expenses and reduces productivity. This study looks at how preprocessing data can help machine learning models forecast employee turnover more accurately. Seven machine learning algorithms—Random Forest, k-Nearest Neighbors (k-NN), XGBoost, Gradient Boosting, Linear Discriminant Analysis (LDA), LightGBM, and Logistic Regression—were used to analyze the 1,470 records in the International Business Machines Human Resources (IBM HR). Employee Attrition dataset. SimpleImputer was used to handle missing values, StandardScaler was used to standardize numerical features, and SelectFromModel was used to choose important features. These actions were essential in improving the accuracy of the model; LDA had the highest accuracy of 87.38%, followed by LightGBM and Logistic Regression, both of which had 87% accuracy. All models' performance metrics were much enhanced by preprocessing; k-NN had the lowest accuracy, at 85.33%. These results demonstrate how important preprocessing is to predictive analytics and how HR management may use it to identify at-risk workers and create successful retention plans.
References
Alsheref, F. K., Fattoh, I. E., & Mead, W. (2022). Automated Prediction of Employee Attrition Using Ensemble Model Based on Machine Learning Algorithms. Computational Intelligence and Neuroscience, 2022. https://doi.org/10.1155/2022/7728668
Davidson, R. G., & Brindha, Dr. (2021, June 8). Inspecting the Impact of Various Factors Influencing Employee Attrition in Hotel Industry. https://doi.org/10.4108/eai.7-6-2021.2308612
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (n.d.). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. https://github.com/Microsoft/LightGBM.
Najafi-Zangeneh, S., Shams-Gharneh, N., Arjomandi-Nezhad, A., & Zolfani, S. H. (2021). An improved machine learning-based employees attrition prediction framework with emphasis on feature selection. Mathematics, 9(11). https://doi.org/10.3390/math9111226
Otchere, D. A., Ganat, T. O. A., Ojero, J. O., Tackie-Otoo, B. N., & Taki, M. Y. (2022). Application of gradient boosting regression model for the evaluation of feature selection techniques in improving reservoir characterisation predictions. Journal of Petroleum Science and Engineering, 208. https://doi.org/10.1016/j.petrol.2021.109244
Pratt, M., Boudhane, M., & Cakula, S. (2021). Employee attrition estimation using random forest algorithm. Baltic Journal of Modern Computing, 9(1), 4966. https://doi.org/10.22364/BJMC.2021.9.1.04
Ponnuru, S. R. (2020). Employee Attrition Prediction using Logistic Regression. International Journal for Research in Applied Science and Engineering Technology, 8(5), 28712875. https://doi.org/10.22214/ijraset.2020.5481
Punnoose, R., & Xlri -Xavier, C. (2016). Prediction of Employee Turnover in Organizations using Machine Learning Algorithms A case for Extreme Gradient Boosting. In IJARAI) International Journal of Advanced Research in Artificial Intelligence (Vol. 5, Issue 9). www.ijarai.thesai.org
Rahamneh, A. A. A. L., Jresat, S. S., Zubaidi, F., & Al-Hawary, S. I. S. (2023). Using the Linear Discriminant Analysis Method to Classify Types of Bowels and Esophageal cancer in Jordan. Information Sciences Letters, 12(3), 12991305. https://doi.org/10.18576/isl/120320
Raza, A., Munir, K., Almutairi, M., Younas, F., & Fareed, M. M. S. (2022). Predicting Employee Attrition Using Machine Learning Approaches. Applied Sciences (Switzerland), 12(13). https://doi.org/10.3390/app12136424
Sethy, A., & Kumar Raut, A. (n.d.). EMPLOYEE ATTRITION RATE PREDICTION USING MACHINE LEARNING APPROACH. Turkish Journal of Physiotherapy and Rehabilitation, 32(3). www.turkjphysiotherrehabil.org
Usha, P. M., & Balaji, N. V. (2021). A comparative study on machine learning algorithms for employee attrition prediction. IOP Conference Series: Materials Science and Engineering, 1085(1), 012029. https://doi.org/10.1088/1757-899x/1085/1/012029
Xu, J., Zhang, Y., & Miao, D. (2020). Three-way confusion matrix for classification: A measure driven view. Information Sciences, 507, 772794. https://doi.org/10.1016/j.ins.2019.06.064
Wardhani, F. H., & Lhaksmana, K. M. (2022). Predicting Employee Attrition Using Logistic Regression with Feature Selection. Sinkron, 7(4), 22142222. https://doi.org/10.33395/sinkron.v7i4.11783
Published
How to Cite
Issue
Section
FUDMA Journal of Sciences