AN IMPROVED HAUSA WORD STEMMING ALGORITHM

  • Sirajo Musa Musa
  • G. N. Obunadike
  • Muhammad Muntasir Yakubu
Keywords: Hausa Language, Information Retrieval, Natural Language Processing, Stemming

Abstract

The explosion of scientific publications in different domains coupled with the introduction and socialization of the internet experienced in the last few decades has made information more available than ever before. Consequently, digital storage capacity has been consistently doubling to reflect this geometric increase in information.  In view of this, Information Retrieval (IR), nowadays considered the dominant form of information access has become even more critical. However, the problem of using free text in indexing and retrieval arising from spelling mistake, alternative in spelling, affixes and abbreviations has continued to bedevil the field of IR. To mitigate this problem, Stemming Algorithm was introduced in the 1960s. Stemming is an automated process of stripping all word derivatives of their inflectional affixes in order to obtain stem of the word. Because stemming is language specific, there are stemming algorithms designed specifically for most of the major languages in the world. With a speaker population of about 150 million Hausa language stands in need of a better stemming algorithm. This research is an attempt to improve upon the existing Hausa word stemming algorithm. Affix stripping method of conflation with reference lookup was used. Using Sirsat’s evaluation method, this research achieved 96.9% as Correctly Stemmed Word Factor (CSWF), Index Compression Factor – 74.76%, Words Stemmed Factor (WSF) – 70.44% and Average Word Conflation Factor – 59.47%.

References

Alhanini, Y., Juzaiddin, M., & Aziz, A. (2011). The enhancement of arabic stemming by using light stemming and dictionary-based stemming. Journal of Software Engineering and Applications, 4, 522-526.

Bashir, M., Rozaimee, A. B., & Wan Malini, B. W. (2015). A Word Stemming Algorithm for Hausa Language. IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 3, Ver. VI, 25-31.

Bimba, A., Norisma, I., Norazlina, K., & Noor, N. F. (2015). Stemming Hausa text: using affix-stripping rules. Springer Science+Business Media Dordrecht.

Dawson, J. (1974). Suffix removal and word conflation. Bulletin of the Association for Literary and Linguistic Computing, 2(3),, 33-46. https://www.herald.ng/full-list-hausa/. (n.d.). Retrieved October 10, 2021, from The Herald: https://www.herald.ng/full-list-hausa/

Ishmailov, A. S., Mashita, A. J., Zailani, A., & Noor Hafizallah, A. R. (2016). A Comparative Study of Stemming Algorithms for use with the Uzbek Language. ResearchGate.

Lovins, J. (1968). Development of A Stemming Algorithm. Development of a stemming algorithm. Mechanical Translation and Computational Vol. 11 (1 & 2), 21-31.

Muazzam Bashir, A. B. (2015). A Word Stemming Algorithm for Hausa Language. IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 3, Ver. VI (May – Jun. 2015), 25-31.

Newman, P. (2000). The Hausa language: An encyclopedic reference grammar. New Heaven: Yale University Press.

Porter, M. (1980). An Algorithm For Suffix Stripping. Program, 14, 130-137.

Rakesh, K., & Vibhakar, M. (2016). Applications of Stemming Algorithms in Information Retrieval - A Review. International Journal of Advanced Research in Information System and Software Engineering.

Published
2022-04-05
How to Cite
MusaS., ObunadikeG. N., & YakubuM. M. (2022). AN IMPROVED HAUSA WORD STEMMING ALGORITHM. FUDMA JOURNAL OF SCIENCES, 6(1), 291 - 295. https://doi.org/10.33003/fjs-2022-0601-899