ENHANCING SENTIMENT ANALYSIS FOR HAUSA LANGUAGE WITH IMPROVED HAUSA TEXT STEMMER (HTS) AND MACHINE LEARNING MODELS

Authors

  • Nasiru Mahadi Jigawa State College of Education Gumel
  • Salisu M. Borodo Bayero University Kano image/svg+xml

DOI:

https://doi.org/10.33003/fjs-2025-0912-3898

Keywords:

Sentiment analysis, Hausa language, Stemming algorithm, Low resource language, AfriSenti Dataset

Abstract

This study enhances sentiment analysis for Hausa, a low-resource language spoken by over 86 million people, by introducing an improved Hausa Text Stemmer (HTS). The proposed algorithm addresses the language’s complex morphology—including prefixes, suffixes, infixes, and confixes—while also expanding common abbreviations and removing stop words. These steps improve text consistency and reduce noise, enabling more accurate feature extraction for sentiment classification. Using the AfriSenti dataset, the study evaluates four classical machine learning models—Support Vector Machine, Naïve Bayes, Gradient Boosting, and Random Forest—with performance measured by accuracy, precision, recall, and F1-score. Comparative tests against two existing stemmers demonstrate the superiority of the proposed HTS, with Gradient Boosting achieving 93.75% accuracy, significantly outperforming baseline accuracies of 86.7% and 73.1%. The findings confirm that the HTS effectively handles key linguistic challenges in Hausa, such as confixes, abbreviations, and stop words, leading to more robust sentiment classification. This work contributes valuable NLP resources for low-resource languages and underscores the importance of tailored preprocessing in sentiment analysis.

References

Ada, E., & Chukwuokoro, I. (2024). Afropolitan journals emerging new media syntax, violation of English syntactic rules, and meaning misrepresentations. Journal of Digital Humanities Association of Southern Africa, 15(1), 189–206.

Adam, F. M., & Inuwa-dutse, I. (2024). Detection and analysis of offensive online content in Hausa language. Nigerian Journal of Computer Engineering and Technology, 2(1), 45–58.

Adeyemi, M. (2024). Facilitating cross-lingual information retrieval evaluations for African languages. African Language Technology Journal, 3(2), 112–125.

Ahmed, R. (2024). Exploring The Impact of Stemming on Text Topic-Based Classification Accuracy. Journal of Linguistics, Culture and Communication, 2(2), 204–224. https://doi.org/10.61320/jolcc.v2i2.204-224

Aliyu, Y., Sarlan, A., Danyaro, K. U., & Rahman, A. S. B. A. (2024). Comparative Analysis of Transformer Models for Sentiment Analysis in Low-Resource Languages. International Journal of Advanced Computer Science and Applications, 15(4), 353–364. https://doi.org/10.14569/IJACSA.2024.0150437

Ariel, Q., Chang, V., & Jayne, C. (2022). A systematic review of social media-based sentiment analysis : Emerging trends and challenges ✩. Decision Analytics Journal, 3(April), 100073. https://doi.org/10.1016/j.dajour.2022.100073

Dongare, P. (2024). Creating Corpus of Low Resource Indian Languages for Natural Language Processing: Challenges and Opportunities. 7th Workshop on Indian Language Data Resource and Evaluation, WILDRE 2024 at LREC-COLING 2024 - Workshop Proceedings, 54–58.

Jabbar, A., Iqbal, S., Alaulamie, A. A., & Ilahi, M. (2024). Building a multilevel inflection handling stemmer to improve search effectiveness for Urdu language. IEEE Access, 12, 39313–39329. https://doi.org/10.1109/ACCESS.2024.3371234

Jim, J. R., Talukder, M. A. R., Malakar, P., Kabir, M. M., Nur, K., & Mridha, M. F. (2024). Recent advancements and challenges of NLP-based sentiment analysis: A state-of-the-art review. Natural Language Processing Journal, 6(February), 100059. https://doi.org/10.1016/j.nlp.2024.100059

Lukwaro, E. A. E., Kalegele, K., & Nyambo, D. G. (2024). A Review on NLP Techniques and Associated Challenges in Extracting Features from Education Data. International Journal of Computing and Digital Systems, 16(1), 961–979. https://doi.org/10.12785/ijcds/160170

Mabokela, K. R., Celik, T., & Raborife, M. (2023). Multilingual Sentiment Analysis for Under-Resourced Languages : A Systematic Review of the Landscape. IEEE Access, 11(November 2022), 15996–16020. https://doi.org/10.1109/ACCESS.2022.3224136

Mamani-Coaquira, Y., & Villanueva, E. (2024). A Review on Text Sentiment Analysis with Machine Learning and Deep Learning Techniques. IEEE Access, 12(December), 193115–193130. https://doi.org/10.1109/ACCESS.2024.3513321

Muhammad, S. H. (2023). Domain-specific and context-aware approaches to sentiment analysis. Journal of Computational Linguistics, 45(3), 234–256.

Musa, S., Obunadike, G. N., & Yakubu, M. M. (2022). An improved Hausa word stemming algorithm. FUDMA Journal of Sciences (FJS), 6(1), 291–295. https://doi.org/10.33003/fjs-2022-0601-899

Rai, A. (2025). Tokenization and stemming of Limbu language. Journal of Natural Language Engineering, 31(2), 145–162. https://doi.org/10.1145/3712018

Rakhmanov, O., & Schlippe, T. (2022). Sentiment analysis for Hausa: Classifying students' comments. In Proceedings of the International Conference on Computational Linguistics and Intelligent Text Processing (pp. 98–105). Springer.

Salahudeen, S. A., Lawan, F. I., Wali, A. M., Imam, A. A., Shuaibu, A. R., Yusuf, A., Rabiu, N. B., Bello, M., Adamu, S. U., Aliyu, S. M., Gadanya, M. S., Muaz, S. A., Ahmad, M. S., Abdullahi, A., & Jamoh, A. Y. (2023). HausaNLP at SemEval-2023 Task 12 : Leveraging African Low Resource.

Salman, A. H., & Al-Jawher, W. A. M. (2024). Performance Comparison of Support Vector Machines, AdaBoost, and Random Forest for Sentiment Text Analysis and Classification. Journal Port Science Research, 7(3), 300–311. https://doi.org/10.36371/port.2024.3.8

Sani, M., Ahmad, A., & Abdulazeez, H. S. (2022). Sentiment analysis of Hausa language tweet using machine learning approach. International Journal of Computer Applications, 8(9), 7–16.

Shehu, H. A., Usman Majikumna, K., Bashir Suleiman, A., Luka, S., Sharif, M. H., Ramadan, R. A., & Kusetogullari, H. (2024). Unveiling Sentiments: A Deep Dive Into Sentiment Analysis for Low-Resource Languages - A Case Study on Hausa Texts. IEEE Access, 12(July), 98900–98916. https://doi.org/10.1109/ACCESS.2024.3427416

Siino, M., Tinnirello, I., & La Cascia, M. (2024). Is text preprocessing still worth the time? A comparative survey on the influence of popular preprocessing methods on Transformers and traditional classifiers. Information Systems, 121(July 2023), 102342. https://doi.org/10.1016/j.is.2023.102342

Tabany, M., & Gueffal, M. (2024). Sentiment Analysis and Fake Amazon Reviews Classification Using SVM Supervised Machine Learning Model. Journal of Advances in Information Technology, 15(1), 49–58. https://doi.org/10.12720/jait.15.1.49-58

Xu, W., Chen, J., Ding, Z., & Wang, J. (2024). Text sentiment analysis and classification based on bidirectional Gated Recurrent Units (GRUs) model. Applied and Computational Engineering, 77(1), 132–137. https://doi.org/10.54254/2755-2721/77/20240670

Research Methodology Flowchart

Downloads

Published

31-12-2025

How to Cite

Mahadi, N., & Borodo, S. M. (2025). ENHANCING SENTIMENT ANALYSIS FOR HAUSA LANGUAGE WITH IMPROVED HAUSA TEXT STEMMER (HTS) AND MACHINE LEARNING MODELS. FUDMA JOURNAL OF SCIENCES, 9(12), 629-638. https://doi.org/10.33003/fjs-2025-0912-3898