MACHINE LEARNING ALGORITHMS FOR TELEGRAM SPAM FILTERING

  • Abubakar Hassan Department of Computer Engineering, University of Maiduguri
  • Yusuf Ayuba Department of Computer Engineering, University of Maiduguri
  • Mohammed Aji Wajiro Directorate of Information and Communication Centre, Ramat Polytechnic Maiduguri
  • Muhammad Zaharadeen Ahmad Department of Computer Engineering, University of Maiduguri
Keywords: Extreme Gradient Boosting, Light Gradient Boosting Machine, CatBoost, Support Vector Machine, K-Nearest Neighbour

Abstract

With unprecedented usage of social media applications to interact in virtual communities, bad entities can now use these platforms to spread their malicious activities such as spam, hate speech, and even phishing to a very large population. Especially, Telegram is suitable for these kinds of activities because it is a new cloud-messenger that is highly popular among bloggers and media around the world, established by Pavel Durov in 2013. As a result, it is necessary for social media platforms to develop algorithms to filter these malicious contents. This paper employs Machine learning algorithms to filter spam messages in Telegram. Dataset obtained from Kaggle was used for the experiments in this paper. Five machine learning models were applied, namely, Extreme Gradient Boosting (XGB), Light Gradient Boosting Machine (LGBM), CatBoosting, Support Vector Machine (SVM) and K-Nearest Neighbours (KNN). Experimental results showed that SVM outperforms other machine learning models used for the study with a classification accuracy of 94%. This is an indication that SVM is a promising algorithm for Spam filtering in Telegram if adopted.

References

Alkadri, A. M., Elkorany, A., & Ahmed, C. (2022). Enhancing Detection of Arabic Social Spam Using Data Augmentation and Machine Learning. Applied Sciences (Switzerland), 12(22). https://doi.org/10.3390/app122211388 DOI: https://doi.org/10.3390/app122211388

Alzamzami, F., Hoda, M., & Saddik, A. El. (2020). Light Gradient Boosting Machine for General Sentiment Classification on Short Texts: A Comparative Evaluation. IEEE Access, 8, 101840–101858. https://doi.org/10.1109/ACCESS.2020.2997330 DOI: https://doi.org/10.1109/ACCESS.2020.2997330

Balfagih, A. M., Keselj, V., & Taylor, S. (2022). N-gram and Word2Vec Feature Engineering Approaches for Spam Recognition on Some Influential Twitter Topics in Saudi Arabia. Journal of Advances in Information Technology, 13(6), 562–568. https://doi.org/10.12720/jait.13.6.562-568 DOI: https://doi.org/10.12720/jait.13.6.562-568

Chen, T., Xu, J., Ying, H., Chen, X., Feng, R., Fang, X., Gao, H., & Wu, J. (2019). Prediction of Extubation Failure for Intensive Care Unit Patients Using Light Gradient Boosting Machine. IEEE Access, 7, 150960–150968. https://doi.org/10.1109/ACCESS.2019.2946980 DOI: https://doi.org/10.1109/ACCESS.2019.2946980

Dada, E. G., Bassi, J. S., Chiroma, H., Abdulhamid, S. M., Adetunmbi, A. O., & Ajibuwa, O. E. (2019). Machine learning for email spam filtering: review, approaches and open research problems. Heliyon, 5(6). https://doi.org/10.1016/j.heliyon.2019.e01802 DOI: https://doi.org/10.1016/j.heliyon.2019.e01802

Dada, E. G., Birma, A. I., & Gora, A. A. (2024). Ensemble machine learning algorithm for cost-effective and timely detection of diabetes in Maiduguri, Borno State. Journal of the Nigerian Society of Physical Sciences, 2175. https://doi.org/10.46481/jnsps.2024.2175 DOI: https://doi.org/10.46481/jnsps.2024.2175

Dada, E. G., Oyewola, D. O., & Yakubu, J. H. (2022). Power Consumption Prediction in Urban Areas using Machine Learning as a Strategy towards Smart Cities. Arid Zone Journal of Basic and Applied Research (AJBAR), 1(1), 11–24. DOI: https://doi.org/10.55639/607bkt

Dar, M., Iqbal, F., Latif, R., Altaf, A., & Jamail, N. S. M. (2023). Policy-Based Spam Detection of Tweets Dataset. Electronics (Switzerland), 12(12). https://doi.org/10.3390/electronics12122662 DOI: https://doi.org/10.3390/electronics12122662

Ghanem, R., & Erbay, H. (2020). Context-dependent model for spam detection on social networks. SN Applied Sciences, 2(9). https://doi.org/10.1007/s42452-020-03374-x DOI: https://doi.org/10.1007/s42452-020-03374-x

Hassan, A., Abatcha, M., & Dada, E. G. (2024). Ensemble Machine Learning Algorithm for Telegram Spam Detection. Arid-Zone Journal of Basic & Applied Research, 3(4), 87–95. https://doi.org/10.55639/607.060504 DOI: https://doi.org/10.55639/607.060504

Maikano, F. A. (2024). MACHINE LEARNING APPROACHES FOR CYBER BULLYING DETECTION IN HAUSA LANGUAGE SOCIAL MEDIA: A COMPREHENSIVE REVIEW AND ANALYSIS. MACHINE LEARNING APPROACHES. FJS FUDMA Journal of Sciences (FJS, 8(3), 344–348. https://doi.org/10.33003/fjs-2024-0803-2517

Oyewola, D. O., & Dada, E. G. (2022). Machine Learning Methods for Predicting the Popularity of Movies. Journal of Artificial Intelligence and Systems, 4(1), 65–82. https://doi.org/10.33969/ais.2022040105 DOI: https://doi.org/10.33969/AIS.2022040105

Published
2024-12-05
How to Cite
HassanA., AyubaY., WajiroM. A., & AhmadM. Z. (2024). MACHINE LEARNING ALGORITHMS FOR TELEGRAM SPAM FILTERING . FUDMA JOURNAL OF SCIENCES, 8(6), 170 - 176. https://doi.org/10.33003/fjs-2024-0806-2799