PARTS OF SPEECH TAGGING: A REVIEW OF TECHNIQUES

  • Jamilu Awwalu Nigerian Defence Academy
  • Saleh El-Yakub Abdullahi Department of Computer Science, Nile University of Nigeria.
  • Abraham Eseoghene Evwiekpaefe Department of Computer Science, Nigerian Defence Academy, Kaduna
Keywords: Rule Based POS Tagging, Stochastic POS Tagging, Hybrid POS Tagging, Word Alignment, Code Switching

Abstract

Technology advances by the day and computers can be considered as valuable to almost every learned person. One of the most uses of computers nowadays is for internet surfing and social networking. Computers in this context are not restricted to desktop or laptop computers only. Internet surfing and social networking has made interactions between people and computers very easy, where people can communicate using their languages thus making processing of these languages a useful task for the computers to interpret. The correct processing of these languages on the computer relies on the correct identification of parts of speech (POS) in sentences which has been an active area of research for a long time. This paper presents a review parts of speech tagging, comparison of different tagging techniques, their characteristics, difficulties, limitation, and Multilingual Parts of Speech (POS) tagging approaches.

References

Abdelali, A., Darwish, K., Durrani, N., & Mubarak, H. (2016). Farasa : A Fast and Furious Segmenter for Arabic. In Proceedings of NAACL-HLT 2016 (Demonstrations) (Vol. 2016, pp. 11–16).

Alghamdi, F., Molina, G., Diab, M., Solorio, T., Hawwari, A., Soto, V., & Hirschberg, J. (2016). Part of Speech Tagging for Code Switched Data. In Proceedings of the Second Workshop on Computational Approaches to Code Switching (pp. 98–107).

Amri, S., Zenkouar, L., & Outahajala, M. (2017). A Comparison of Three Machine Learning Methods for Amazigh POS Tagging, 83–87.

Bigvand Mansouri, A., Bu, T., & Sarkar, A. (2017). Joint Prediction of Word Alignment with Alignment Types. Transactions of the Association for Computational Linguistics, 5, 501–514.

Brill, E. (1992). A Simple Rule-Based Part of Speech Tagger. ANLP, 152–155.

Btoush, M. H., Alarabeyyat, A., & Olab, I. (2016). Rule Based Approach for Arabic Part of Speech Tagging and Name Entity Recognition. International Journal of Advanced Computer Science and Applications (IJACSA), 7(6), 331–335.

Çetino, Ö., Schulz, S., & Vu, T. N. (2016). Challenges of Computational Processing of Code-Switching.

Christodoulopoulos, C., & Steedman, M. (2010). Two Decades of Unsupervised POS induction : How far have we come ? In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (pp. 575–584).

Dalal, A., Kumar, N., Uma, S., Sandeep, S., & Pushpak, B. (2007). Building Feature Rich POS Tagger for Morphologically Rich Languages : Experiences in Hindi. In 5th International Conference on Natural Language Processing (p. 9).

Dandapat, S., Sarkar, S., & Basu, A. (2004). A Hybrid Model for Part-of-Speech Tagging and its Application to Bengali. Transactions on Engineering, Computing, and Technology, VI(December), 169–172.

Das, D., & Petrov, S. (2011). Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (pp. 600–609).

Dien, D., & Kiem, H. (2003). POS-Tagger for English-Vietnamese Bilingual Corpus. In HLT-NAACL-PARALLEL ’03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond (pp. 88–95).

Hasan, F. M., UzZaman, N., & Khan, M. (2007). Comparison of different POS Tagging Techniques (n-gram, HMM and Brill’s tagger) for Bangla. In Advances and Innovations in Systems, Computing Sciences and Software Engineering (pp. 121–126). Dordrecht: Springer Netherlands. http://doi.org/10.1007/978-1-4020-6264-3_23

Hladka, B. (2000). Czech Language Tagging. Institute of Formal and Applied Linguistics, Charles University.

Khalifa, S., Zalmout, N., & Habash, N. (2016). YAMAMA : Yet Another Multi-Dialect Arabic Morphological Analyzer. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations (pp. 223–227).

Khemakhem, I. T., Jamoussi, S., & Hamadou, A. Ben. (2016). POS Tagging without a Tagger : Using Aligned Corpora for Transferring Knowledge to Under-Resourced Languages POS Tagging without a Tagger : Using Aligned Corpora for Transferring Knowledge to Under-Resourced Languages. Computacion Y Sistemas, 20(4), 667–679. http://doi.org/10.13053/cys-20-4-2430

Kumawat, D., & Jain, V. (2015). POS Tagging Approaches: A Comparison. International Journal of Computer Applications, 118(6), 975–8887. Retrieved from http://research.ijcaonline.org/volume118/number6/pxc3903148.pdf

Maamouri, M., Bies, A., Buckwalter, T., Diab, M. T., Habash, N., Rambow, O., & Tabessi, D. (2006). Developing and Using a Pilot Dialectal Arabic Treebank. In LREC. inproceedings.

Maamouri, M., Bies, A., Buckwalter, T., & Mekki, W. (2004). The penn arabic treebank : Building a large-scale annotated arabic corpus The Penn Arabic Treebank : Building a Large-Scale Annotated Arabic Corpus. In NEMLAR conference on Arabic language resourcesand tools (pp. 466–467).

Mahar, J. A., & Memon, G. Q. (2010). Rule Based Part of Speech Tagging of Sindhi Language. In Proceedings of the 2010 International Conference on Signal Acquisition and Processing (pp. 101–106). inproceedings, Washington, DC, USA: IEEE Computer Society. http://doi.org/10.1109/ICSAP.2010.27

Manning, C. D., Bauer, J., Finkel, J., Bethard, S. J., & McClosky, D. (2014). The Stanford CoreNLP Natural Language Processing Toolkit. In 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations (pp. 55–60).

Md., A. kalaam, & Hasan, R. (2014). Review of Stochastic POS tagging techniques used in Bengali. International Journal of Computer Applications, 102(8), 35–39.

Merialdo, B. (1994). Tagging English Text with a Probabilistic Model. Computational Linguistics, 20(2), 155–171.

Naseem, N., Snyder, B., Eisenstein, J., & Barzilay, R. (2009). Multilingual Part-of-Speech Tagging : Two Unsupervised Approaches. Journal of Artificial Intelligence Research, 36, 341–385.

Pandian, S. L., & Geetha, T. V. (2008). Morpheme based Language Model for Tamil Part-of-Speech Tagging. Polibits, 38, 19–25.

Pasha, A., Al-badrashiny, M., Diab, M., Kholy, A. El, Eskander, R., Habash, N., … Roth, R. M. (2014). MADAMIRA : A Fast , Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic. In Proceedings of LREC, Reykjavik, Iceland. (pp. 1094–1101).

Rathod, S., & Govilkar, S. (2015). Survey of various POS tagging techniques for Indian regional languages. International Journal of Computer Science and Information Technologies, 6(3), 2525–2529.

Robin. (2009). World of Computing. Articles on Natural language Processing. Retrieved February 28, 2018, from http://language.worldofcomputing.net/pos-tagging/parts-of-speech-tagging.html

Snyder, B., Naseem, T., Eisenstein, J., & Barzilay, R. (2008). Unsupervised Multilingual Learning for POS Tagging. In EMNLP ’08 Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 1041–1050).

Sonai, K. A. M., Krishnan, J. K., Sayeed, M. S., & Muniapan, P. (2017). Comparison of Stochastic and Rule-Based POS Tagging on Malay Online Text. American Journal of Applied Sciences, 14(9), 843–851. http://doi.org/10.3844/ajassp.2017.843.851

Soto, V., & Hirschberg, J. (2018). Joint Part-of-Speech and Language ID Tagging for Code-Switched Data. In Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching (pp. 1–10).

Published
2020-10-07
How to Cite
AwwaluJ., AbdullahiS. E.-Y., & EvwiekpaefeA. E. (2020). PARTS OF SPEECH TAGGING: A REVIEW OF TECHNIQUES. FUDMA JOURNAL OF SCIENCES, 4(2), 712 - 721. https://doi.org/10.33003/fjs-2020-0402-325

Most read articles by the same author(s)