Automatic Stemming for Amharic Text: An Experiment Using Successor Variety Approach

Fikremariam, Genet Mezemir (2009) Automatic Stemming for Amharic Text: An Experiment Using Successor Variety Approach. Masters thesis, Addis Ababa University.

[img] PDF (Automatic Stemming for Amharic Text: An Experiment Using Successor Variety Approach)
Genet, Mezemir Fikremariam.pdf - Accepted Version
Restricted to Repository staff only

Download (661kB) | Request a copy

Abstract

The extensive use of the World Wide Web and the increasing digital availability of information and documents accelerated the demand for technologies and tools for an online data retrieval and extraction application. The natural language research, with the aim of quick and reliable online information searching and access, is one major component of the current advanced information technology development. In this research, an indexing system was developed and programmed by using the Successor Variety Stemming Algorithm to find stems for Amharic words. The research has set out to discover whether the Successor Variety Stemming Algorithm technique with the peak and plateau, entropy and complete word methods can be used for the Amharic language or what the limitation would be. In addition, the peak and plateau method compared with the entropy and the complete words method. Stemming is typically used in the hope of improving the accuracy of the search reducing the size of the index. A corpus of 6270 words was obtained form the Ethiopian News Agency (ENA) and Walta Information Center and used to train and test the methods. The experiment result showed that, the peak and plateau method had a performance of 71.8% level of accuracy, but the performance of the entropy and complete word methods are 63.95% and 57.99% level of accuracy respectively. Based on the observation made from the experimentation result, the successor variety algorithm with the peak and plateau method had a better performance than successor variety algorithm with the entropy method.

Item Type: Thesis (Masters)
Subjects: P Language and Literature > PL Languages and literatures of Eastern Asia, Africa, Oceania
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Z Bibliography. Library Science. Information Resources > Z665 Library Science. Information Science
Divisions: Africana
Depositing User: Selom Ghislain
Date Deposited: 12 Jul 2018 12:03
Last Modified: 12 Jul 2018 12:03
URI: http://thesisbank.jhia.ac.ke/id/eprint/7435

Actions (login required)

View Item View Item