Afaan Oromo Named Entity Recognition Using Hybrid Approach

Abdi, Sani Genemo (2015) Afaan Oromo Named Entity Recognition Using Hybrid Approach. Masters thesis, Addis Ababa University.

[img] PDF (Afaan Oromo Named Entity Recognition Using Hybrid Approach)
Abdi, Sani.pdf - Accepted Version
Restricted to Repository staff only

Download (1MB) | Request a copy

Abstract

Named Entity Recognition and Classification (NERC) is an essential and challenging task in Natural Language Processing (NLP), particularly for resource scarce language like Afaan Oromo(AO). It seeks to classify words which represent names in text into predefined categories like person name, location, organization, date, time etc.Thus, this paper deals with some attempts in this direction. Mostly researcher have applied Machine Learning for Afaan Oromo Named Entity Recognition(AONER) while no researchers have used hand crafted rules and hybrid approach for Named Entity Recognition(NER) task. This thesis work deals with AONER System using hybrid approach, which contains machine learning(ML) and rule based components. The rule based component has parsing, filtering, grammar rules, whitelist gazetteers, blacklist gazetteers and exact matching components. The ML component has ML model and classifier components. We used General Architecture for Text Engineering (GATE) developer tool for rule based component and Weka in ML part. By using algorithms and rules we developed, we have identified Named Entity (NE) from Afaan Oromo texts, like name of persons, organizations, location, miscellaneous.Feature selection and rules are important factor in recognition of Afaan Oromo Name Entity (AONE). Various rules have been developed like prefix rule, suffix rule, clue word rule, context rule, first name and last name rule. We have used AONER corpus of size 27588, which is developed by Mandefro [1].From this corpus we have used corpus of size 23000 for training and 4588 for testing of our work. And we havean average result of 84.12% Precision, 81.21% Recall and 82.52% F-Score.

Item Type: Thesis (Masters)
Uncontrolled Keywords: Named Entity Recognition, Named Entities, GATE Developer, Weka, Afaan Oromo.
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
T Technology > T Technology (General)
Divisions: Africana
Depositing User: Selom Ghislain
Date Deposited: 13 Jun 2018 09:34
Last Modified: 13 Jun 2018 09:34
URI: http://thesisbank.jhia.ac.ke/id/eprint/4194

Actions (login required)

View Item View Item