MACHINE LEARNING APPROACHES FOR CYBER BULLYING DETECTION IN HAUSA LANGUAGE SOCIAL MEDIA: A COMPREHENSIVE REVIEW AND ANALYSIS
Abstract
The study was carried out to evaluate the performance of Support Vector Machine (SVM), Naive Bayes, and Logistic Regression in detecting cyberbullying among Hausa language users on Twitter. Data was collected from the Kaggle Twitter database, focusing on interactions in the Hausa language. The dataset comprises 20,094 instances, including 12,322 labeled as cyberbullying (positive) and 7,772 labeled as non-cyberbullying (negative). Synthetic Minority Over-sampling Technique (SMOTE) was utilized to address class imbalance. Python libraries such as Pandas, scikit-learn, and NLTK were employed for data cleaning, transformation, integration, and reduction. The results obtained throughout the study underscored the power of machine learning algorithms in cyberbullying detection, particularly in the context of the Hausa language. Naive Bayes emerged as the top-performing algorithm, demonstrating exceptional precision, recall, F1-score, and accuracy. Logistic Regression also showcased commendable performance, while SVM exhibited competitive metrics but with limitations in recall. Furthermore, the study highlighted the significant impact of effective preprocessing techniques in optimizing the models' effectiveness. Tailored preprocessing strategies, such as TF-IDF transformation and SMOTE for class imbalance, played a crucial role in enhancing recall and overall accuracy. However, it is essential to acknowledge that cyberbullying is a multifaceted issue influenced by cultural, contextual, and technological factors. Therefore, future research endeavors should explore advanced techniques, such as deep learning and cross-lingual approaches, to further enhance cyberbullying detection frameworks.
Copyright (c) 2024 FUDMA JOURNAL OF SCIENCES
This work is licensed under a Creative Commons Attribution 4.0 International License.
FUDMA Journal of Sciences