IMPROVED ELECTRONIC MAIL CLASSIFICATION USING HYBRIDIZED ROOT WORD EXTRACTIONS

Authors

  • A. O. Okunade

Keywords:

Spam, Ham, Email, Suspicious terms, Stemming, Filter, Spammer

Abstract

Content based spam filter prevents spam mail from successful delivery to the targeted host using Bayesian probability approach. Unfortunately, spammers deceived content based filters by coming up with sophisticated means of circumventing detective pattern of developed content filters, manipulating and rearranging spam mail suspicious terms/content to fool such filters, since content based spam filters only work effectively, if the suspicious terms are lexically and grammatically correct. However, this paper proposes word stemming combined with Bayesian probability approach to regain spam-free inbox in the electronic mail infrastructure. The hybridized technique was used to detect modified suspicious terms by examining the base root of the misspelled or modified manipulated suspicious words/terms and reconverting them to the correct token or near correct token and examine as such. The implementation of the algorithm when tested with direct and manipulated spam mail content was able to successfully identified spam mail with manipulated suspicious terms and 99% of the tested  known manipulated suspicious terms spam mail were identified and classified as spam. However manipulated spam mail is of no effect in hybridized word stemming combined with Bayesian probability spam filter approach. The algorithm is effective, accurate, prevent false classification and negate spammer's innovation.

References

Ahlqvist, E., Storm, P., Käräjämäki, A., Martinell, M., Dorkhan, M., Carlsson, A., Vikman, P., Prasad, R. B., Aly, D. M., Almgren, P., Wessman, Y., Shaat, N., Spégel, P., Mulder, H., Lindholm, E., Melander, O., Hansson, O., Malmqvist, U., Lernmark, Å., … Groop, L. (2018). Novel subgroups of adult-onset diabetes and their association with outcomes: a data-driven cluster analysis of six variables. The Lancet Diabetes and Endocrinology. https://doi.org/10.1016/S2213-8587(18)30051-2

Alex, S. A., Nayahi, J. J. V., Shine, H., & Gopirekha, V. (2022). Deep convolutional neural network for diabetes mellitus prediction. Neural Computing and Applications. https://doi.org/10.1007/s00521-021-06431-7

Bhutta, Z. A., Salam, R. A., Gomber, A., Lewis-Watts, L., Narang, T., Mbanya, J. C., & Alleyne, G. (2021). A century past the discovery of insulin: global progress and challenges for type 1 diabetes among children and adolescents in low-income and middle-income countries. In The Lancet. https://doi.org/10.1016/S0140-6736(21)02247-9

Dremin, V., Marcinkevics, Z., Zherebtsov, E., Popov, A., Grabovskis, A., Kronberga, H., Geldnere, K., Doronin, A., Meglinski, I., & Bykov, A. (2021). Skin Complications of Diabetes Mellitus Revealed by Polarized Hyperspectral Imaging and Machine Learning. IEEE Transactions on Medical Imaging. https://doi.org/10.1109/TMI.2021.3049591

El Massari, H., Mhammedi, S., Sabouri, Z., & Gherabi, N. (2022). Ontology-Based Machine Learning to Predict Diabetes Patients. Lecture Notes in Networks and Systems. https://doi.org/10.1007/978-3-030-91738-8_40

Hatua, A., Subudhi, B. N., Veerakumar, T., & Ghosh, A. (2021). Early detection of diabetic retinopathy from big data in hadoop framework. Displays. https://doi.org/10.1016/j.displa.2021.102061

Kiv, S., Heng, S., Wautelet, Y., Poelmans, S., & Kolp, M. (2022). Using an ontology for systematic practice adoption in agile methods: Expert system and practitioners-based validation. Expert Systems with Applications. https://doi.org/10.1016/j.eswa.2022.116520

Komi, M., Li, J., Zhai, Y., & Xianguo, Z. (2017). Application of data mining methods in diabetes prediction. 2017 2nd International Conference on Image, Vision and Computing, ICIVC 2017. https://doi.org/10.1109/ICIVC.2017.7984706

Krishnamoorthi, R., Joshi, S., Almarzouki, H. Z., Shukla, P. K., Rizwan, A., Kalpana, C., & Tiwari, B. (2022). A Novel Diabetes Healthcare Disease Prediction Framework Using Machine Learning Techniques. Journal of Healthcare Engineering. https://doi.org/10.1155/2022/1684017

Kumar, K. G. N., & Christopher, T. (2016). Analysis of liver and diabetes datasets by using unsupervised two-phase neural network techniques. Biomedical Research (India).

Kushwaha, J. S., Gupta, V. K., Singh, A., & Giri, R. (2022). Significant correlation between taste dysfunction and HbA1C level and blood sugar fasting level in type 2 diabetes mellitus patients in at a tertiary care centre in north India. Diabetes Epidemiology and Management, 100092.

Mandal, N., Grambergs, R., Mondal, K., Basu, S. K., Tahia, F., & Dagogo-Jack, S. (2021). Role of ceramides in the pathogenesis of diabetes mellitus and its complications. In Journal of Diabetes and its Complications. https://doi.org/10.1016/j.jdiacomp.2020.107734

Ogurtsova, K., Guariguata, L., Barengo, N. C., Ruiz, P. L. D., Sacre, J. W., Karuranga, S., Sun, H., Boyko, E. J., & Magliano, D. J. (2022). IDF diabetes Atlas: Global estimates of undiagnosed diabetes in adults for 2021. Diabetes Research and Clinical Practice. https://doi.org/10.1016/j.diabres.2021.109118

Oza, A., & Bokhare, A. (2022). Diabetes Prediction Using Logistic Regression and K-Nearest Neighbor. In Congress on Intelligent Systems, 407–418.

Parveen, S., Patre, P., & Minj, J. (2023). Various Diabetes Detection Techniques a Survey. Information and Communication Technology for Competitive Strategies (ICTCS 2021), 261–269.

PIMA Indian Diabetes Database. (n.d.). https://github.com/npradaschnor/Pima-Indians-Diabetes-Dataset/blob/master/diabetes.csv

Pranata, R., Henrina, J., Raffaello, W. M., Lawrensia, S., & Huang, I. (2021). Diabetes and COVID-19: The past, the present, and the future. In Metabolism: Clinical and Experimental. https://doi.org/10.1016/j.metabol.2021.154814

Ranjitha, R., Agalya, V., & Archana, K. (2022). Diabetes Prediction by Artificial Neural Network. Lecture Notes in Networks and Systems. https://doi.org/10.1007/978-981-16-5529-6_76

Thakkar, H., Shah, V., Yagnik, H., & Shah, M. (2021). Comparative anatomization of data mining and fuzzy logic techniques used in diabetes prognosis. Clinical EHealth. https://doi.org/10.1016/j.ceh.2020.11.001

Vijayan, V. V., & Anjali, C. (2016). Prediction and diagnosis of diabetes mellitus - A machine learning approach. 2015 IEEE Recent Advances in Intelligent Computational Systems, RAICS 2015. https://doi.org/10.1109/RAICS.2015.7488400

Yun, W., Zhang, X., Li, Z., Liu, H., & Han, M. (2021). Knowledge modeling: A survey of processes and techniques. International Journal of Intelligent Systems. https://doi.org/10.1002/int.22357

Published

2023-03-30

How to Cite

Okunade, A. O. (2023). IMPROVED ELECTRONIC MAIL CLASSIFICATION USING HYBRIDIZED ROOT WORD EXTRACTIONS. FUDMA JOURNAL OF SCIENCES, 3(1), 56 - 64. Retrieved from https://fjs.fudutsinma.edu.ng/index.php/fjs/article/view/1427