ENHANCED SMS SPAM DETECTION USING BERNOULLI NAIVE BAYES WITH TF-IDF
Abstract
The use of mobile text messaging for communication is increasingly widespread, with Short Message Service (SMS) experiencing significant growth over the last decade. Consequently, the increase in SMS usage has led to a concerning rise in SMS spam, presenting substantial challenges for users and service providers. This study proposes a novel method for detecting SMS spam by combining Term Frequency-Inverse Document Frequency (TF-IDF) with Bernoulli Naïve Bayes (BNB) algorithm. The approach employs the use of TF-IDF for comprehensive feature extraction and the classification capabilities of the Bernoulli Naïve Bayes Algorithm. Through experimental validation employing TF-IDF for feature extraction and the BNB algorithm for classification, the results demonstrate high accuracy (98.36%), precision (99.19%), and a notable Matthews Correlation Coefficient (MCC) of 0.93, showcasing superior model performance compared to existing benchmarks. Likewise, the proposed model shows efficient processing time (0.22 seconds). By combining strengths of TF-IDF and BNB, the approach offers effective SMS spam detection, surpassing the performance of traditional and deep learning classifiers. This research contributes valuable insights towards enhancing SMS security, thereby increasing trust between users and service providers.
References
Abayomi-Alli, O., Misra, S., Abayomi-Alli, A., & Odusami, M. (2019). A review of soft techniques for SMS spam classification: Methods, approaches and applications. Engineering Applications of Artificial Intelligence, 86, 197-212. DOI: https://doi.org/10.1016/j.engappai.2019.08.024
Abid, M. A., Ullah, S., Siddique, M. A., Mushtaq, M. F., Aljedaani, W., & Rustam, F. (2022). Spam SMS filtering based on text features and supervised machine learning techniques. Multimedia Tools and Applications, 81(28), 39853-39871. DOI: https://doi.org/10.1007/s11042-022-12991-0
Ajueyitsi, O., & Ekuobase, G. O. (2024). A MULTIFACETED SENTIMENT ANALYSIS APPROACH TO THE ESTIMATION OF THE STRENGTH OF ONLINE SUPPORT FOR POLITICAL CANDIDATES IN NIGERIAS ELECTIONS: Online Support Strength of Political Candidates in Nigerias Elections. FUDMA JOURNAL OF SCIENCES, 8(6), 184 - 192. https://doi.org/10.33003/fjs-2024-0806-2896 DOI: https://doi.org/10.33003/fjs-2024-0806-2896
Al Saidat, M. R., Yerima, S. Y., & Shaalan, K. (2024). Advancements of SMS Spam Detection: A Comprehensive Survey of NLP and ML Techniques. Procedia Computer Science, 244, 248-259. DOI: https://doi.org/10.1016/j.procs.2024.10.198
Almeida, T. (2012). SMS Spam Collection. UCI Machine Learning Repository. https://archive.ics.uci.edu/ml/datasets/SMS%2BSpam%2BCollection
Broadbent, S. (2020). Approaches to personal communication. In Digital anthropology (pp. 127-145). Routledge. DOI: https://doi.org/10.4324/9781003085201-9
Chakraborty, A., Chattaraj, S., Karmakar, S., & Mishrra, S. (2021). A robust approach for effective spam detection using supervised learning techniques. Machine Learning Techniques and Analytics for Cloud Security, 171-191. DOI: https://doi.org/10.1002/9781119764113.ch9
Dada, E. G., Bassi, J. S., Chiroma, H., Adetunmbi, A. O., & Ajibuwa, O. E. (2019). Machine learning for email spam filtering: Review, approaches and open research problems. Heliyon, 5(6), e01802. DOI: https://doi.org/10.1016/j.heliyon.2019.e01802
Gadde, S., Lakshmanarao, A., & Satyanarayana, S. (2021). SMS spam detection using machine learning and deep learning techniques. In 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS) (pp. 1-6). IEEE. DOI: https://doi.org/10.1109/ICACCS51430.2021.9441783
Gupta, V., Mehta, A., Goel, A., Dixit, U., & Pandey, A. C. (2019). Spam detection using ensemble learning. In Harmony Search and Nature Inspired Optimization Algorithms: Theory and Applications, ICHSA 2018 (pp. 173-184). Springer. DOI: https://doi.org/10.1007/978-981-13-0761-4_63
Imran, M., Castillo, C., Diaz, F., & Vieweg, S. (2015). Processing social media messages in mass emergency: A survey. ACM Computing Surveys (Csur), 47(4), 1-38. DOI: https://doi.org/10.1145/2771588
Jain, G., Sharma, M., & Agarwal, B. (2019). Spam detection in social media using convolutional and long short term memory neural network. Annals of Mathematics and Artificial Intelligence, 85(1), 21-44. DOI: https://doi.org/10.1007/s10472-018-9612-z
Karim, A., Azam, S., Shanmugam, B., Kannoorpatti, K., & Alazab, M. (2019). A comprehensive survey for intelligent spam email detection. IEEE Access, 7, 168261-168295. DOI: https://doi.org/10.1109/ACCESS.2019.2954791
Li, C.-Y. (2019). How social commerce constructs influence customers' social shopping intention? An empirical study of a social commerce website. Technological Forecasting and Social Change, 144, 282-294. DOI: https://doi.org/10.1016/j.techfore.2017.11.026
Liu, X., Lu, H., & Nayak, A. (2021). A spam transformer model for SMS spam detection. IEEE Access, 9, 80253-80263. https://doi.org/10.1109/access.2021.3081479 DOI: https://doi.org/10.1109/ACCESS.2021.3081479
Lubis, A. R., Lubis, M., & Azhar, C. D. (2019). The effect of social media to the sustainability of short message service (SMS) and phone call. Procedia Computer Science, 161, 687-695. DOI: https://doi.org/10.1016/j.procs.2019.11.172
Oswald, C., Simon, S. E., & Bhattacharya, A. (2022). Spotspam: Intention analysisdriven sms spam detection using bert embeddings. ACM Transactions on the Web (TWEB), 16(3), 1-27. DOI: https://doi.org/10.1145/3538491
Poster, W. R. (2022). Introduction to special issue on scams, fakes, and frauds. new media & society, 24(7), 1535-1547. DOI: https://doi.org/10.1177/14614448221099232
Qabasiyu, M. G., Zayyad, M. A., & Abdullahi, S. (2023). An Ensembled Based Machine Learning Technique of Sentiment Analysis. Journal of Telecommunication, Electronic and Computer Engineering (JTEC), 15(1), 23-28. DOI: https://doi.org/10.54554/jtec.2023.15.01.004
Rodrigues, A. P., Fernandes, R., Shetty, A., K, A., Lakshmanna, K., & Shafi, R. M. (2022). [Retracted] RealTime Twitter Spam Detection and Sentiment Analysis using Machine Learning and Deep Learning Techniques. Computational Intelligence and Neuroscience, 2022(1), 5211949. DOI: https://doi.org/10.1155/2022/5211949
Roy, P. K., Singh, J. P., & Banerjee, S. (2020). Deep learning to filter SMS spam. Future Generation Computer Systems, 102, 524-533. DOI: https://doi.org/10.1016/j.future.2019.09.001
ShafiI, M. A., Abd Latiff, M. S., Chiroma, H., Osho, O., Abdul-Salaam, G., Abubakar, A. I., & Herawan, T. (2017). A review on mobile SMS spam filtering techniques. IEEE Access, 5, 15650-15666. DOI: https://doi.org/10.1109/ACCESS.2017.2666785
Sjarif, N. N. A., Azmi, N. F. M., Chuprat, S., Sarkan, H. M., Yahya, Y., & Sam, S. M. (2019). SMS spam message detection using term frequency-inverse document frequency and random forest algorithm. Procedia Computer Science, 161, 509-515. DOI: https://doi.org/10.1016/j.procs.2019.11.150
Silva, R. M., Alberto, T. C., Almeida, T. A., & Yamakami, A. (2017). Towards filtering undesired short text messages using an online learning approach with semantic indexing. Expert Systems with Applications, 83, 314-325. DOI: https://doi.org/10.1016/j.eswa.2017.04.055
Wang, H., Dai, B., & Yang, D. (2019). A comparative study of two different spam detection methods. In Dependability in Sensor, Cloud, and Big Data Systems and Applications: 5th International Conference, DependSys 2019, Guangzhou, China, November 1215, 2019, Proceedings (pp. 450-459). Springer.
Wei, F., & Nguyen, T. (2020). A lightweight deep neural model for SMS spam detection. In 2020 International Symposium on Networks, Computers and Communications (ISNCC) (pp. 1-6). IEEE DOI: https://doi.org/10.1109/ISNCC49221.2020.9297350
Copyright (c) 2025 FUDMA JOURNAL OF SCIENCES

This work is licensed under a Creative Commons Attribution 4.0 International License.
FUDMA Journal of Sciences