A FILTER MODEL FOR TEXT CATEGORIZATION AGAINST ONLINE HATE SPEECHES
Abstract
Text classification is a method of grouping a document text into different predefined categories. This method has been applied in different areas such as classification of scientific articles, spam filtering, and classification of document genre. Text classification is a popular task in data mining because of its level of accuracy and easy application. The Internet is a common message transmission medium among many people, billions of messages move around the internet on a daily basis through different platforms on the internet such as e-mail, Facebook, Twitter, etc. Some of these messages are being transmitted with wrong motives, thus it became imperative to design a model for filtering some of these messages using data mining algorithms to sieve away the unwanted messages from circulation. In the light of this, this paper applied three data mining techniques namely: Support Vector Machine (SVM), Naïve Bayes and K-Nearest Neighbour (KNN) to develop models that can be applied to filter messages from Facebook and e-mail to counter circulation of online hate speeches on these platforms. It also compared the performance of these models against collected data to identify the state of the art text classifier. It was observed that the Naïve Bayes algorithm performed better than the other two with an accuracy of 61.5 and ROC of 0.66.
References
Buber, E. ,Diri, B. , &Sahingoz, O. K. (2017). Detecting phishing attacks from URL by using NLP techniques. In 2017 International conference on computer science and Engineering (UBMK) pp. 337–342.
Cao, Y., Han, W.,& Le, Y. (2008). Anti-phishing based on automated individual white-list. In Proceedings of the 4th ACM workshop on digital identity
Chiew, K. L. , Yong, K. S. C. , & Tan, C. L. (2018). A survey of phishing attacks: Their types, vectors and technical approaches. Expert Systems with Applications, 106 , 1–20 .
Crammer, K., &Singer, Y.(2001).On the algorithmic implementation of multiclass kernel-based Vector Machines.Journal of Machine Learning Research, 2: 265–292.
David, S.,&Whillock, R.K. (eds.).(1995). Hate Speech. Thousand Oaks, CA: Sage Publications, Inc. Introduction. pp. ix-xvi; “Symbolism and the Representation of Hate in Visual Discourse.” pp. 122-141; “The Use of Hate as a Stratagem for Achieving Political and Social Goals.” pp. 28-54; “Afterword: Hate, or Power?” pp. 267-275.
Drucker, H., Vapnik, V.,& Wu, D.(1999). Support Vector Machines for spam categorization. IEEE Transactions on Neural Networks, 10(5): 1048–1054.
Du, R., Safavi-Naini, R., & Susilon, W. (2013).Web filtering using text classification. The 11th IEEE International Conference on Networks, 28 September - 1 October 2003, 325-330.
Dumais, S.T., Platt, J., Heckerman, D., & Sahami, M. (1998). Inductive learning algorithms and representations for text categorization. Proceedings of CIKM-98, 7th ACM International Conference on Information and Knowledge Management, ACM Press, New York, US: Bethesda, US, pp. 148–155
Copyright (c) 2023 FUDMA JOURNAL OF SCIENCES
This work is licensed under a Creative Commons Attribution 4.0 International License.
FUDMA Journal of Sciences