HAUSA HATE SPEECH DETECTION USING LOGISTIC REGRESSION

Authors

  • Idris Sulaiman Bashir
  • Aminu Ahmad Muhammad

DOI:

https://doi.org/10.33003/fjs-2026-1001-4541

Keywords:

Hausa, Hate Speech Detection, Logistic Regression, Random Forest, TF-IDF, Low-resource Languages, African NLP

Abstract

Hate speech on social media poses serious risks to social cohesion, particularly in multilingual and politically sensitive regions such as West Africa. While Natural Language Processing techniques have achieved strong performance for high-resource languages, African languages remain under-represented due to limited annotated data and linguistic complexity. This study investigates hate speech detection in Hausa using traditional machine learning approaches, focusing on interpretability and efficiency in low-resource settings. Experiments are conducted on the AFRIHATE Hausa corpus using Logistic Regression as the primary classifier and Random Forest as a comparative model. Text is represented using Term Frequency-Inverse Document Frequency (TF-IDF) and Bag-of-Words features. Model performance is evaluated using accuracy, precision, recall, and F1-score under stratified cross-validation. Results show that Logistic Regression with TF-IDF features achieves the best overall performance, with an accuracy of 94% and an F1-score of 93%, outperforming Random Forest across feature representations. The findings indicate that simple, interpretable models remain strong baselines for Hausa hate speech detection and offer practical value for content moderation in low-resource African language contexts.

References

Adelani, D. I., Abbott, J., Neubig, G., Dossou, B. F. P., Kreutzer, J., Lignos, C., Palen-Michel, C., Buzaaba, H., Rijhwani, S., Ruder, S., & Adewumi, T. (2022). MasakhaNER 2.0: Africa-centric transfer learning for named entity recognition. Transactions of the Association for Computational Linguistics, 10(1), 1467–1484. https://doi.org/10.1162/tacl_a_00524

Adewumi, T. O., Adebara, I., & Adelani, D. I. (2022). Towards benchmark datasets for African language hate speech detection. In Proceedings of LREC (pp. 1678–1686).

Conneau, A., Bapna, A., Zhang, Y., Ma, M., von Platen, P., Lozhkov, A., ... & Johnson, M. (2022). Xtreme-s: Evaluating cross-lingual speech representations. arXiv preprint arXiv:2203.10752.

Davidson, T., Warmsley, D., Macy, M., & Weber, I. (2017). Automated hate speech detection and the problem of offensive language. In Proceedings of ICWSM (pp. 512–515). https://doi.org/10.1609/icwsm.v11i1.14955

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT (pp. 4171–4186). https://doi.org/10.48550/arXiv.1810.04805

Maikano, F. A. (2024). Machine Learning Approaches for Cyber Bullying Detection In Hausa Language Social Media: A Comprehensive Review And Analysis. FUDMA Journal of Sciences, 8(3), 344-348.

Masakhane NLP Community. (2021). Building open, community-driven resources for African languages. In Proceedings of the ACL Workshop on African NLP.

Mozafari, M., Farahbakhsh, R., & Crespi, N. (2020). A BERT-based transfer learning approach for hate speech detection in online social media. Complexity, 2020, 1–12. https://doi.org/10.1155/2020/8828421

Muhammad, S. H., Abdulmumin, I., Ayele, A. A., et al. (2025). AFRIHATE: A multilingual collection of hate speech and abusive language datasets for African languages. In Proceedings of NAACL (pp. 1705–1720).

PeaceTech Lab. (2017). Social media hate speech lexicons: Nigeria. Washington, DC: PeaceTech Lab.

Röttger, P., Vidgen, B., Nguyen, D., & Derczynski, L. (2021). HateCheck: Functional tests for hate speech detection models. Proceedings of ACL, 41–58. https://doi.org/10.48550/arXiv.2012.15606

Sosimi, A. A., Ipinnimo, O., Folorunso, C. O., Adim, B. A., & Onoyom-Ita, E. (2024). Hate speech identification in West Africa, using machine-learning techniques. Arid Zone Journal of Engineering, Technology & Environment, 20(7), 55–68.

Vidgen, B., & Derczynski, L. (2020). Directions in abusive language training data: Garbage in, garbage out. PLOS ONE, 15(12), e0243300. https://doi.org/10.1371/journal.pone.0243300

Waseem, Z., & Hovy, D. (2016). Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In Proceedings of NAACL-HLT (pp. 88–93). https://doi.org/10.18653/v1/N16-2013

Zhang, Z., Robinson, D., & Tepper, J. (2018). Detecting hate speech on Twitter using a convolution-GRU based deep neural network. In Proceedings of the European Semantic Web Conference (pp. 745–760). https://doi.org/10.1007/978-3-319-93417-4_48

Hyperparameter Options Considered for RF Optimization

Downloads

Published

21-01-2026

How to Cite

Bashir, I. S., & Muhammad, A. A. (2026). HAUSA HATE SPEECH DETECTION USING LOGISTIC REGRESSION. FUDMA JOURNAL OF SCIENCES, 10(1), 249-252. https://doi.org/10.33003/fjs-2026-1001-4541

Most read articles by the same author(s)