A Named Entity Recognition for Amharic

Alemu, Besufikad (2013) A Named Entity Recognition for Amharic. Masters thesis, Addis Ababa University.

[img] PDF (A Named Entity Recognition for Amharic)
Besufikad, Alemu final.pdf - Accepted Version
Restricted to Repository staff only

Download (984kB) | Request a copy

Abstract

This thesis describes the development of Named Entity Recognition (NER) system for Amharic. NER is a process of identifying and categorizing all named entities in a document into predefined classes like person, organization, location, time, and numeral expressions. Amharic Named Entity Recognition (ANER) is proposed based on supervised machine learning, Conditional Random Fields (CRF). Amharic corpus of size 13,538 words have been developed with Stanford tagging scheme. Since the objective of the work is working on feature set of Amharic Named Entities (NE), fifteen experiments were conducted by interchanging different features. Previous and next word, named entity tag of a word, word pairs, word shape, prefix and suffix features have been used to identify three NEs, Person, Organization and Location. The results of the experiment show that proper features combination in NER has a great role. The highest F-measure achieved in this work is 80.66%, with a window size of two on both(left and right)sides, previous and next tag of a current word and prefix and suffix with length four. The worst performance achieved is 61.97%, feature sets are a window size of two from the left side of a word and previous and next word of the current token. Finally the identified optimal feature sets from the experiment are: prefix and suffix with a length of four, previous and next NE tag of a token and window size features

Item Type: Thesis (Masters)
Subjects: P Language and Literature > PL Languages and literatures of Eastern Asia, Africa, Oceania
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Z Bibliography. Library Science. Information Resources > ZA Information resources
Divisions: Africana
Depositing User: Selom Ghislain
Date Deposited: 20 Sep 2018 12:34
Last Modified: 20 Sep 2018 12:34
URI: http://thesisbank.jhia.ac.ke/id/eprint/5328

Actions (login required)

View Item View Item