Designing a Stemmer for Afaan Oromo Text: A Hybrid Approach

Tesfaye, Debela (2010) Designing a Stemmer for Afaan Oromo Text: A Hybrid Approach. Masters thesis, Addis Ababa University.

[img] PDF (Designing a Stemmer for Afaan Oromo Text: A Hybrid Approach)
Debela, Tesfaye final.pdf - Accepted Version
Restricted to Repository staff only

Download (661kB) | Request a copy

Abstract

Most natural language processing systems use stemmer as a separate module in their architecture. Specially, it is very significant for developing, machine translator, speech recognizer and search engines. In linguistic morphology, stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form. In this thesis work, a stemming system for Afan Oromo is presented. This system takes as input a word and removes its affixes according to a rule based algorithm. This stemmer is not enough to define every rule applied in Afan Oromo word formation. Therefore, N-gram is integrated with the rule to handle cases that are not covered by rule in the hybrid version of this stemmer. The algorithm follows the known Porter algorithm for the English language and it is developed according to the grammatical rules of the Afan Oromo, as they are described in a Grammatical sketch of Written Oromo (Mewis, 2001) and Caasluga Afaan Oromoo, Jildii-1 (Oromo, 1995). Afan Oromo morphology was studied and described in order to model the language and develop an automatic procedure for conflation. The inflectional and derivational morphologies of the language are discussed. The result of the study is a prototype context sensitive iterative stemmer for Afan Oromo. Error counting technique was employed to evaluate the performance of this stemmer. For testing purpose 198 sentences (with a total of 2458 words) is collected from different public Afaan Oromo newspapers and bulletins to make the testing set address variety of issues. An evaluation of the system shows that the algorithms accuracy works with better performance than other past stemming algorithms for Afan Oromo giving 95.73 percent correct results. Finally, possible extensions of the proposed system and further evaluation methods are briefly reviewed.

Item Type: Thesis (Masters)
Subjects: P Language and Literature > PL Languages and literatures of Eastern Asia, Africa, Oceania
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4050 Electronic information resources
Divisions: Africana
Depositing User: Selom Ghislain
Date Deposited: 02 Oct 2018 12:15
Last Modified: 02 Oct 2018 12:15
URI: http://thesisbank.jhia.ac.ke/id/eprint/5785

Actions (login required)

View Item View Item