AI-POWERED PLAGIARISM DETECTION: LEVERAGING FORENSIC LINGUISTICS AND NATURAL LANGUAGE PROCESSING

  • Anthony Nwohiri University of Lagos
  • Opemipo JODA Palantir Technologies, London, United Kingdom
  • Olasupo Ajayi University of the Western Cape
Keywords: Academia, Forensic linguistics, Natural language processing, Plagiarism, Plagiarism checker

Abstract

Plagiarism of material from the Internet is nothing new to academia and it is particularly rampant. This challenge can range from borrowing a particularly apt phrase without attribution, to paraphrasing someone else’s original idea without citation, to wholesale contract cheating. Plagiarized content can infringe on copyright laws and could incur hefty fines on publishers and authors. Unintentional plagiarism mostly occurs due to inaccurate citation. Most plagiarism checkers ignore this fact. Moreover, plagiarizers are increasingly becoming negatively “smarterâ€. All these necessitate a plagiarism detector that would efficiently handle the challenges. Several plagiarism detectors have been developed but each with its own peculiar limitations. This paper aims at developing an AI-driven plagiarism detector that can crawl the web to index articles and documents, generate similarity score between two local documents, train users on how to properly format in-text citations, identify source code plagiarism and use natural language processing and forensic linguistics to properly analyse plagiarism index.

References

Ali, A.M., Abdulla, H., and Snášel, V. (2011). Overview and Comparison of Plagiarism Detection Tools. DATESO. 161–172.

Blackboard. Blackboard safeassign: A plagiarism prevention tool. https://www.blackboard.com/teaching-learning/learning-management/safe-assign. Last accessed: 26 March 2021.

Britannica. Definition of plagiarism. https://www.britannica.com/topic/plagiarism. Last accessed: 20 March 2021.

Checkforplagiarism. Checkforplagiarism.net. www.checkforplagiarism.net. Last accessed: 26 March 2021.

Clough, P. and Stevenson, M. (2011). Developing a corpus of plagiarised short answers. Language Resources and Evaluation, 45(1):5–24.

DeLong, D. (2012). Unintentional plagiarism. Global Journal of Engineering Education, 4(1):137–155.

Dr Dataman. Looking into natural language processing (NLP). https://dataman- ai.medium.com/, 2018. Last accessed: 26 March 2021.

Dreher, H. and Williams, R. (2006). Assisted Query Formulation Using Normalised Word Vector and Dynamic Ontological Filtering. FQAS. Lecture Notes in Artificial Intelligence, 282–294.

Foltýnek, T., Dlabolová, D., Anohina-Naumeca, A. et al. (2020). Testing of support tools for plagiarism detection. International Journal of Educational Technology in Higher Education, 17(1):46. https://doi.org/10.1186/s41239-020-00192-4.

Heinrich, E. and Maurer, H. (2000). Active documents: Concept, implementation and applications. Journal of Universal Computer Science, 6(12):1197–1202.

Hoad, T. and Zobel, J. (2003). Methods for identifying versioned and plagiarised documents. Journal of the American Society for Information Science and Technology, 54(3):203–215.

ICAI. International center for academic integrity. http://www.academicintegrity.org. Last accessed: 22 March 2021.

Ison, D. (2017). Academic Misconduct and the Internet. Handbook of Research on Academic Misconduct in Higher Education.

KIT. Jplag – detecting software plagiarism. https://jplag.ipd.kit.edu. Last accessed: 26 March 2021.

Kumar, A. (2021). The role of AI in plagiarized text. Learning Hub. https://learn.g2.com/ai-for- plagiarism, 2020. Last accessed: 26 March 2021.

Marcos. (2019). A beginner’s guide to ruby on rails mvc (model view controller) pat- tern. https://hackernoon.com/beginners-guide-to-ruby-on-rails-mvc-model-view-controller- pattern-4z19196a, 2019.

McCabe, D. L. (2005). Cheating among college and university students: A north american perspective. International Journal for Educational Integrity, 1(1):1-11.

Meyer zu Eissen, S. and Stein, B. (2006). “Intrinsic Plagiarism Detectionâ€, in Proceedings of the 28th European Conference on IR Research (ECIR), Lecture Notes in Computer Science (LNCS), 3936: 565–569, doi: 10.1007/11735106_66.

Niezgoda, S. and Way, T. (2006). Snitch: A software tool for detecting cut and paste plagiarism. ACM SIGCSE Bulletin, 38(1):51–55.

Oladeji, F., Ajayi, O., Koleoso, R., Uwadia, C. (2018). Third eye – a plagiarism checker for academic theses. National Conference on Digital Inclusion: Opportunities, Challenges and Strategies, 27(1):225–236.

Pennycook, A. (1996). Borrowing others’ words: Text, ownership, memory and plagiarism. TESOL Quarterly, 30(2):201–230.

Pertile, S., Moreira, V. P., & Rosso, P. (2016). Comparing and combining content- and citation-based approaches for plagiarism detection. Journal of the Association for Information Science and Technology, 67(10), 2511–2526. https://doi.org/10.1002/asi.23593.

PlagAware. Plagaware. www.plagaware.com. Last accessed: 26 March 2021.

Python. Python 3.0 release. https://www.python.org/download/releases/3.0, 2008. Last accessed: 14 March 2021.

Rezaeian, N. and Novikova, G. (2017). Detecting near-duplicates in Russian documents through using fingerprint algorithm simhash. Procedia Computer Science, 103(1):421–425.

RHIG (2021). Random House Compact Unabridged Dictionary. Random House Information Group. Last accessed: 26 July 2021

Shivakumar, N. and Garcia-Molina, H. (1999). Finding near-replicas of documents on the web. Lecture Notes in Computer Science, 1590(1):204–212.

Sulaiman, R. (2018). Types and Factors Causing Plagiarism in Papers of English Education Students. Journal of English Education, 3(1):17–22.

Szuchman, L. (2010) Writing with Style: APA Style Made Easy. Cengage Learning.

Turnitin. Turnitin. www.turnitin.com. Last accessed: 26 March 2021.

Vani, K. and Gupta, D. (2016). Study on extrinsic text plagiarism detection techniques and tools. Journal of Engineering Science and Technology, 9(4): 2511-2526.

Published
2021-11-03
How to Cite
NwohiriA., JODAO., & AjayiO. (2021). AI-POWERED PLAGIARISM DETECTION: LEVERAGING FORENSIC LINGUISTICS AND NATURAL LANGUAGE PROCESSING. FUDMA JOURNAL OF SCIENCES, 5(3), 207 - 218. https://doi.org/10.33003/fjs-2021-0503-700