MACHINE LEARNING ALGORITHMS FOR TELEGRAM SPAM FILTERING
Abstract
With unprecedented usage of social media applications to interact in virtual communities, bad entities can now use these platforms to spread their malicious activities such as spam, hate speech, and even phishing to a very large population. Especially, Telegram is suitable for these kinds of activities because it is a new cloud-messenger that is highly popular among bloggers and media around the world, established by Pavel Durov in 2013. As a result, it is necessary for social media platforms to develop algorithms to filter these malicious contents. This paper employs Machine learning algorithms to filter spam messages in Telegram. Dataset obtained from Kaggle was used for the experiments in this paper. Five machine learning models were applied, namely, Extreme Gradient Boosting (XGB), Light Gradient Boosting Machine (LGBM), CatBoosting, Support Vector Machine (SVM) and K-Nearest Neighbours (KNN). Experimental results showed that SVM outperforms other machine learning models used for the study with a classification accuracy of 94%. This is an indication that SVM is a promising algorithm for Spam filtering in Telegram if adopted.
References
Alkadri, A. M., Elkorany, A., & Ahmed, C. (2022). Enhancing Detection of Arabic Social Spam Using Data Augmentation and Machine Learning. Applied Sciences (Switzerland), 12(22). https://doi.org/10.3390/app122211388 DOI: https://doi.org/10.3390/app122211388
Alzamzami, F., Hoda, M., & Saddik, A. El. (2020). Light Gradient Boosting Machine for General Sentiment Classification on Short Texts: A Comparative Evaluation. IEEE Access, 8, 101840–101858. https://doi.org/10.1109/ACCESS.2020.2997330 DOI: https://doi.org/10.1109/ACCESS.2020.2997330
Balfagih, A. M., Keselj, V., & Taylor, S. (2022). N-gram and Word2Vec Feature Engineering Approaches for Spam Recognition on Some Influential Twitter Topics in Saudi Arabia. Journal of Advances in Information Technology, 13(6), 562–568. https://doi.org/10.12720/jait.13.6.562-568 DOI: https://doi.org/10.12720/jait.13.6.562-568
Chen, T., Xu, J., Ying, H., Chen, X., Feng, R., Fang, X., Gao, H., & Wu, J. (2019). Prediction of Extubation Failure for Intensive Care Unit Patients Using Light Gradient Boosting Machine. IEEE Access, 7, 150960–150968. https://doi.org/10.1109/ACCESS.2019.2946980 DOI: https://doi.org/10.1109/ACCESS.2019.2946980
Dada, E. G., Bassi, J. S., Chiroma, H., Abdulhamid, S. M., Adetunmbi, A. O., & Ajibuwa, O. E. (2019). Machine learning for email spam filtering: review, approaches and open research problems. Heliyon, 5(6). https://doi.org/10.1016/j.heliyon.2019.e01802 DOI: https://doi.org/10.1016/j.heliyon.2019.e01802
Dada, E. G., Birma, A. I., & Gora, A. A. (2024). Ensemble machine learning algorithm for cost-effective and timely detection of diabetes in Maiduguri, Borno State. Journal of the Nigerian Society of Physical Sciences, 2175. https://doi.org/10.46481/jnsps.2024.2175 DOI: https://doi.org/10.46481/jnsps.2024.2175
Dada, E. G., Oyewola, D. O., & Yakubu, J. H. (2022). Power Consumption Prediction in Urban Areas using Machine Learning as a Strategy towards Smart Cities. Arid Zone Journal of Basic and Applied Research (AJBAR), 1(1), 11–24. DOI: https://doi.org/10.55639/607bkt
Dar, M., Iqbal, F., Latif, R., Altaf, A., & Jamail, N. S. M. (2023). Policy-Based Spam Detection of Tweets Dataset. Electronics (Switzerland), 12(12). https://doi.org/10.3390/electronics12122662 DOI: https://doi.org/10.3390/electronics12122662
Ghanem, R., & Erbay, H. (2020). Context-dependent model for spam detection on social networks. SN Applied Sciences, 2(9). https://doi.org/10.1007/s42452-020-03374-x DOI: https://doi.org/10.1007/s42452-020-03374-x
Hassan, A., Abatcha, M., & Dada, E. G. (2024). Ensemble Machine Learning Algorithm for Telegram Spam Detection. Arid-Zone Journal of Basic & Applied Research, 3(4), 87–95. https://doi.org/10.55639/607.060504 DOI: https://doi.org/10.55639/607.060504
Maikano, F. A. (2024). MACHINE LEARNING APPROACHES FOR CYBER BULLYING DETECTION IN HAUSA LANGUAGE SOCIAL MEDIA: A COMPREHENSIVE REVIEW AND ANALYSIS. MACHINE LEARNING APPROACHES. FJS FUDMA Journal of Sciences (FJS, 8(3), 344–348. https://doi.org/10.33003/fjs-2024-0803-2517
Oyewola, D. O., & Dada, E. G. (2022). Machine Learning Methods for Predicting the Popularity of Movies. Journal of Artificial Intelligence and Systems, 4(1), 65–82. https://doi.org/10.33969/ais.2022040105 DOI: https://doi.org/10.33969/AIS.2022040105
Copyright (c) 2024 FUDMA JOURNAL OF SCIENCES
This work is licensed under a Creative Commons Attribution 4.0 International License.
FUDMA Journal of Sciences