ENHANCED SMS SPAM DETECTION USING BERNOULLI NAIVE BAYES WITH TF-IDF

Abdullahi Burhanuddeen Ahmed; Khalid Haruna

doi:10.33003/fjs-2025-0901-3226

Abdullahi Burhanuddeen Ahmed Bayero University Kano
Khalid Haruna Bayero University Kano

DOI: https://doi.org/10.33003/fjs-2025-0901-3226

Keywords: SMS Spam Detection, TF-IDF, Bernoulli Naïve Bayes, Machine Learning, Text Classification, Feature Extraction

Abstract

The use of mobile text messaging for communication is increasingly widespread, with Short Message Service (SMS) experiencing significant growth over the last decade. Consequently, the increase in SMS usage has led to a concerning rise in SMS spam, presenting substantial challenges for users and service providers. This study proposes a novel method for detecting SMS spam by combining Term Frequency-Inverse Document Frequency (TF-IDF) with Bernoulli Naïve Bayes (BNB) algorithm. The approach employs the use of TF-IDF for comprehensive feature extraction and the classification capabilities of the Bernoulli Naïve Bayes Algorithm. Through experimental validation employing TF-IDF for feature extraction and the BNB algorithm for classification, the results demonstrate high accuracy (98.36%), precision (99.19%), and a notable Matthews Correlation Coefficient (MCC) of 0.93, showcasing superior model performance compared to existing benchmarks. Likewise, the proposed model shows efficient processing time (0.22 seconds). By combining strengths of TF-IDF and BNB, the approach offers effective SMS spam detection, surpassing the performance of traditional and deep learning classifiers. This research contributes valuable insights towards enhancing SMS security, thereby increasing trust between users and service providers.

References

Abayomi-Alli, O., Misra, S., Abayomi-Alli, A., & Odusami, M. (2019). A review of soft techniques for SMS spam classification: Methods, approaches and applications. Engineering Applications of Artificial Intelligence, 86, 197-212. DOI: https://doi.org/10.1016/j.engappai.2019.08.024

Abid, M. A., Ullah, S., Siddique, M. A., Mushtaq, M. F., Aljedaani, W., & Rustam, F. (2022). Spam SMS filtering based on text features and supervised machine learning techniques. Multimedia Tools and Applications, 81(28), 39853-39871. DOI: https://doi.org/10.1007/s11042-022-12991-0

Ajueyitsi, O., & Ekuobase, G. O. (2024). A MULTIFACETED SENTIMENT ANALYSIS APPROACH TO THE ESTIMATION OF THE STRENGTH OF ONLINE SUPPORT FOR POLITICAL CANDIDATES IN NIGERIAS ELECTIONS: Online Support Strength of Political Candidates in Nigerias Elections. FUDMA JOURNAL OF SCIENCES, 8(6), 184 - 192. https://doi.org/10.33003/fjs-2024-0806-2896 DOI: https://doi.org/10.33003/fjs-2024-0806-2896

Al Saidat, M. R., Yerima, S. Y., & Shaalan, K. (2024). Advancements of SMS Spam Detection: A Comprehensive Survey of NLP and ML Techniques. Procedia Computer Science, 244, 248-259. DOI: https://doi.org/10.1016/j.procs.2024.10.198

Almeida, T. (2012). SMS Spam Collection. UCI Machine Learning Repository. https://archive.ics.uci.edu/ml/datasets/SMS%2BSpam%2BCollection

Broadbent, S. (2020). Approaches to personal communication. In Digital anthropology (pp. 127-145). Routledge. DOI: https://doi.org/10.4324/9781003085201-9

Chakraborty, A., Chattaraj, S., Karmakar, S., & Mishrra, S. (2021). A robust approach for effective spam detection using supervised learning techniques. Machine Learning Techniques and Analytics for Cloud Security, 171-191. DOI: https://doi.org/10.1002/9781119764113.ch9

Dada, E. G., Bassi, J. S., Chiroma, H., Adetunmbi, A. O., & Ajibuwa, O. E. (2019). Machine learning for email spam filtering: Review, approaches and open research problems. Heliyon, 5(6), e01802. DOI: https://doi.org/10.1016/j.heliyon.2019.e01802

Gadde, S., Lakshmanarao, A., & Satyanarayana, S. (2021). SMS spam detection using machine learning and deep learning techniques. In 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS) (pp. 1-6). IEEE. DOI: https://doi.org/10.1109/ICACCS51430.2021.9441783

Gupta, V., Mehta, A., Goel, A., Dixit, U., & Pandey, A. C. (2019). Spam detection using ensemble learning. In Harmony Search and Nature Inspired Optimization Algorithms: Theory and Applications, ICHSA 2018 (pp. 173-184). Springer. DOI: https://doi.org/10.1007/978-981-13-0761-4_63

Imran, M., Castillo, C., Diaz, F., & Vieweg, S. (2015). Processing social media messages in mass emergency: A survey. ACM Computing Surveys (Csur), 47(4), 1-38. DOI: https://doi.org/10.1145/2771588

Jain, G., Sharma, M., & Agarwal, B. (2019). Spam detection in social media using convolutional and long short term memory neural network. Annals of Mathematics and Artificial Intelligence, 85(1), 21-44. DOI: https://doi.org/10.1007/s10472-018-9612-z

Karim, A., Azam, S., Shanmugam, B., Kannoorpatti, K., & Alazab, M. (2019). A comprehensive survey for intelligent spam email detection. IEEE Access, 7, 168261-168295. DOI: https://doi.org/10.1109/ACCESS.2019.2954791

Li, C.-Y. (2019). How social commerce constructs influence customers' social shopping intention? An empirical study of a social commerce website. Technological Forecasting and Social Change, 144, 282-294. DOI: https://doi.org/10.1016/j.techfore.2017.11.026

Liu, X., Lu, H., & Nayak, A. (2021). A spam transformer model for SMS spam detection. IEEE Access, 9, 80253-80263. https://doi.org/10.1109/access.2021.3081479 DOI: https://doi.org/10.1109/ACCESS.2021.3081479

Lubis, A. R., Lubis, M., & Azhar, C. D. (2019). The effect of social media to the sustainability of short message service (SMS) and phone call. Procedia Computer Science, 161, 687-695. DOI: https://doi.org/10.1016/j.procs.2019.11.172

Oswald, C., Simon, S. E., & Bhattacharya, A. (2022). Spotspam: Intention analysisdriven sms spam detection using bert embeddings. ACM Transactions on the Web (TWEB), 16(3), 1-27. DOI: https://doi.org/10.1145/3538491

Poster, W. R. (2022). Introduction to special issue on scams, fakes, and frauds. new media & society, 24(7), 1535-1547. DOI: https://doi.org/10.1177/14614448221099232

Qabasiyu, M. G., Zayyad, M. A., & Abdullahi, S. (2023). An Ensembled Based Machine Learning Technique of Sentiment Analysis. Journal of Telecommunication, Electronic and Computer Engineering (JTEC), 15(1), 23-28. DOI: https://doi.org/10.54554/jtec.2023.15.01.004

Rodrigues, A. P., Fernandes, R., Shetty, A., K, A., Lakshmanna, K., & Shafi, R. M. (2022). [Retracted] RealTime Twitter Spam Detection and Sentiment Analysis using Machine Learning and Deep Learning Techniques. Computational Intelligence and Neuroscience, 2022(1), 5211949. DOI: https://doi.org/10.1155/2022/5211949

Roy, P. K., Singh, J. P., & Banerjee, S. (2020). Deep learning to filter SMS spam. Future Generation Computer Systems, 102, 524-533. DOI: https://doi.org/10.1016/j.future.2019.09.001

ShafiI, M. A., Abd Latiff, M. S., Chiroma, H., Osho, O., Abdul-Salaam, G., Abubakar, A. I., & Herawan, T. (2017). A review on mobile SMS spam filtering techniques. IEEE Access, 5, 15650-15666. DOI: https://doi.org/10.1109/ACCESS.2017.2666785

Sjarif, N. N. A., Azmi, N. F. M., Chuprat, S., Sarkan, H. M., Yahya, Y., & Sam, S. M. (2019). SMS spam message detection using term frequency-inverse document frequency and random forest algorithm. Procedia Computer Science, 161, 509-515. DOI: https://doi.org/10.1016/j.procs.2019.11.150

Silva, R. M., Alberto, T. C., Almeida, T. A., & Yamakami, A. (2017). Towards filtering undesired short text messages using an online learning approach with semantic indexing. Expert Systems with Applications, 83, 314-325. DOI: https://doi.org/10.1016/j.eswa.2017.04.055

Wang, H., Dai, B., & Yang, D. (2019). A comparative study of two different spam detection methods. In Dependability in Sensor, Cloud, and Big Data Systems and Applications: 5th International Conference, DependSys 2019, Guangzhou, China, November 1215, 2019, Proceedings (pp. 450-459). Springer.

Wei, F., & Nguyen, T. (2020). A lightweight deep neural model for SMS spam detection. In 2020 International Symposium on Networks, Computers and Communications (ISNCC) (pp. 1-6). IEEE DOI: https://doi.org/10.1109/ISNCC49221.2020.9297350

ENHANCED SMS SPAM DETECTION USING BERNOULLI NAIVE BAYES WITH TF-IDF

Abstract

References

Announcements

FUDMA Journal of Sciences - Vol. 9_2025 Call for Manuscript Submission