Development of Longest-Match Based Stemmer for Texts of Wolaita Language
International Journal on Data Science and Technology
Volume 4, Issue 3, September 2018, Pages: 79-83
Received: May 19, 2018; Accepted: Jul. 5, 2018; Published: Jul. 30, 2018
Girma Yohannis Bade, Department of Computer Science,Wolaita Sodo University, Wolaita, Ethiopia
Hussien Seid, Department of Computer Science and IT, Arba-Minch University, Arba-Minch, Ethiopia
This research presents design, experiment and development of longest-match based Stemmer for Wolaita texts. The objective of this paper is to conflate the variants of Wolaita text words into its stem with better accuracy, using Longest-Match based approach. To help the researcher how to compile the possible combination of suffixes, the deep analysis of Wolaita word morphology has been made. For data preprocess and implementation, C# programming language is used. After preprocessing, 12789 unique words are reserved to experiment this research. Out of these unique words, 1200 words are randomly selected earlier and kept separate for testing purpose. Then the developed stemmer was tested using Paice’s actual error counting method. The output on that test dataset has showed 91.84% accuracy over actual manually stemmed words. The obtained result shows that the rule based longest match approach is promising for stemming Wolaita language texts.
Stemmer, Natural Language Processing, Morphology, Longest-Match
To cite this article
Girma Yohannis Bade, Hussien Seid, Development of Longest-Match Based Stemmer for Texts of Wolaita Language, International Journal on Data Science and Technology. Vol. 4, No. 3, 2018, pp. 79-83. doi: 10.11648/j.ijdst.20180403.11
