EFFECTS OF DATA DELETION AND WEIGHTING ON FISHER’S LINEAR CLASSIFICATION METHOD: A ROBUSTIFICATION APPROACH
DOI:
https://doi.org/10.33003/fjs-2025-0903-3245Keywords:
Influential observations, Mahalanobis distance, F-weight, Fisher classification, RobustnessAbstract
The classical supervised classification model's performance is hampered by the effects of influential observations (IOs). The influential observations(IO’s) when deleted, weighted,Winsorized, truncated and retained have enormous effects in making replicative inferences in different classification models. Due to the influence of IO’s on classical supervised classification models, different methods such as IO deletion or weighting have been introduced to reduce the influence of IO’s. Some of these influential observations reduction or deletion methods have resulted in information loss of various degrees. In this study, we investigated the effects of IOs deletion and weighting using the Mahalanobis distance as a plug in to enhance the robustness of the Fisher linear classification method (FLCM). We proposed an F-weight plug in method to robustify the FLCM. We compared the performance of these methods to determine whether IO deletion or IO weighting retards or enhances the classification accuracy of the FLCM. The study affirmed that IO weighting using the F-weight minimizes information loss more than the IO deletion using the Mahalanobis distance. This study concludes that the variant of FLCM based on the F-weight method showed improved classification accuracy, and efficiency more than the Mahalanobis distance based FLCM.
References
Abid, F., &Izeboudjen, N. (2020). Predicting Forest Fire in Algeria Using Data Mining Techniques: Case Study of the Decision Tree Algorithm. Advances in Intelligent Systems and Computing, 1105 AISC. https://doi.org/10.1007/978-3-030-36674-2_37
Croux, C., & Dehon, C. (2001). Robust linear discriminant analysis using S-estimators. Canadian Journal of Statistics, 29(3). https://doi.org/10.2307/3316042
Ghosh, A., SahaRay, R., Chakrabarty, S., & Bhadra, S. (2021). Robust generalised quadratic discriminant analysis. Pattern Recognition, 117. https://doi.org/10.1016/j.patcog.2021.107981
He, X., & Fung, W. K. (2000). High Breakdown Estimation for Multiple Populations with Applications to Discriminant Analysis. Journal of Multivariate Analysis, 72(2). https://doi.org/10.1006/jmva.1999.1857
Apanapudor, JS; Umukoro, J; Okwonu, FZ; Okposo, N,(2023), Optimal solution techniques for control problem of evolution equations, Science World Journal,18(3): 503-508
Hubert, M., & Debruyne, M. (2009). Breakdown value. Wiley Interdisciplinary Reviews: Computational Statistics, 1(3). https://doi.org/10.1002/wics.34
Okwonu, F. Z; Ahad, N. A.; Okoloko, I. E.; Apanapudor, J. S.; Kamaruddin, S. A; Arunaye, F. I. (2022). Robust hybrid classification methods and applications, Pertanika J. Sci. & Technol. 30 (4): 2831 - 2850
Apanapudor, J. S; Aderibigbe, FM; Okwonu, FZ, (2020). An optimal penalty constant for discrete optimal control regulator problems, Journal of Physics: Conference Series, 1529(4): 042073
Okwonu, F. Z.(2024), Nonpharmaceutical And Pharmaceutical Covid-19 Prediction Models, FUDMA Journal Of Sciences,Vol.8(3): 309-313.
Hubert, M., & Debruyne, M. (2010). Minimum covariance determinant. WIREs Computational Statistics, 2(1), 3643. https://doi.org/10.1002/wics.61
Hubert, M., Rousseeuw, P. J., & van Aelst, S. (2008). High-breakdown robust multivariate methods. Statistical Science, 23(1). https://doi.org/10.1214/088342307000000087
Jennison, C., Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., & Stahel, W. A. (1987). Robust Statistics: The Approach Based on Influence Functions. Journal of the Royal Statistical Society. Series A (General), 150(3). https://doi.org/10.2307/2981480
Johnson, R. A., & Wichern, D. W. (1992). Applied Multivariate Statistical Analysis (3rd ed.). Prentice-Hall, Inc, Englewood Cliffs.
Okwonu, F. Z.; Ahad, N.A., Apanapudor J.S.; Arunaye I.F.(2021). Robust Multivariate Correlation Techniques: A Confirmation Analysis using Covid-19 Data Set. Pertanika J. Sci. & Technol. 29 (2): 999 - 1015 (2021). DOI: https://doi.org/10.47836/pjst.29.2.16
Law, J., Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., & Stahel, W. A. (1986). Robust Statistics-The Approach Based on Influence Functions. The Statistician, 35(5). https://doi.org/10.2307/2987975
Lim, Y. F., Yahaya, S. S. S., & Ali, H. (2018). Robust linear discriminant analysis with highest breakdown point estimator. Journal of Telecommunication, Electronic and Computer Engineering, 10(111).
Okwonu, F. Z.; Ahad, N.A. Apanapudor, J. S. Arunaye, F. I.(2020). Covid-19 Prediction Model (COVID-19-PM) For Social Distancing: The Height Perspective: COVID-19 Prediction Model for Social Distancing: The Height Perspective. Proceedings of the Pakistan Academy of Sciences: A. Physical and Computational Sciences,57(4): 93-98.
Maechler, M., Rousseeuw, P., Croux, C., Todorov, V., Ruckstuhl, A., Salibian-Barrera, M., Verbeke, T., Koller, M., Conceicao, E., & di Palma, M. A. (2021). robustbase: Basic Robust Statistics R package version 0.93-7. In cran.stat.upd.edu.ph.
Maronna, R. A., Martin, R. D., &Yohai, V. J. (2006). Robust Statistics: Theory and Methods. In Robust Statistics: Theory and Methods. https://doi.org/10.1002/0470010940
Okwonu, F.Z. and Othman, A.R.(2013). Comparative performance of classical Fisher linear discriminant analysis and robust Fisher linear discriminant analysis, Matematika,29: 213-220,https://doi.org/10.11113/matematika.v29.n.594
Nursalam, 2016, metode penelitian, Maronna, R. A., Martin, R. D., Yohai, V. J., & Salibin-Barrera, M. (2019). Robust Statistics Theory and Methods (with R) Second. In Wiley Series in Probability and Statistics (Vol. 53, Issue 9).
Okwonu, F. Z. (2012). Several Robust Techniques in Two-Groups Unbiased Linear Classification. https://core.ac.uk/download/pdf/199245931.pdf.
Okwonu, F. Z., Ahad, N. A., Ogini, N. O., Okoloko.I.E., & Wan, W. Z. (2022). Comparative Performance Evaluation of Efficiency for High Dimensional Classification Methods. Journal Of Information and Communication Technology, Vol.21(3): 437-464
Qin, X., Wang, S., Chen, B., & Zhang, K. (2020). Robust Fisher Linear Discriminant Analysis with Generalized Correntropic Loss Function. Proceedings - 2020 Chinese Automation Congress, CAC 2020. https://doi.org/10.1109/CAC51589.2020.9326644.
Okwonu, F.Z. Dieng, H.; Othman, A. R.; Hui, O. S.(2012), Classification of aedes adults mosquitoes in two distinct groups based on fisher linear discriminant analysis and FZOARO techniques, Mathematical Theory and Modeling,2(6): 22-30
Rousseeuw, P. J., & Hubert, M. (2018). Anomaly detection by robust statistics. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(2). https://doi.org/10.1002/widm.1236
Okwonu, F. Z. Ahad, N. A.Hamid, H.; Muda, N. & Olimjon, S.S. (2023). Enhanced robust univariate classification methods for solving outliers and overfitting problems, Journal of Information and Communication TechnologyVol.22(1):1-30.
Seheult, A. H., Green, P. J., Rousseeuw, P. J., & Leroy, A. M. (1989). Robust Regression and Outlier Detection. Journal of the Royal Statistical Society. Series A (Statistics in Society), 152(1). https://doi.org/10.2307/2982847
Okwonu, F. Z.; Othman, A. R.(2013): Heteroscedastic variancecovariance matrices for unbiased two groups linear classification methods, Applied Mathematical Sciences,7(138): 6855-6865. http://dx.doi.org/10.12988/ams.2013.39486
Tyler, D. E. (2008). Robust Statistics: Theory and Methods. Journal of the American Statistical Association, 103(482). https://doi.org/10.1198/jasa.2008.s239
Wang, H., Lu, X., Hu, Z., & Zheng, W. (2014). Fisher discriminant analysis with L1-norm. IEEE Transactions on Cybernetics, 44(6). https://doi.org/10.1109/TCYB.2013.2273355