Alemu, Besufikad (2013) A Named Entity Recognition for Amharic. Masters thesis, Addis Ababa University.
PDF (A Named Entity Recognition for Amharic)
Besufikad, Alemu final.pdf - Accepted Version Restricted to Repository staff only Download (984kB) | Request a copy |
Abstract
This thesis describes the development of Named Entity Recognition (NER) system for Amharic. NER is a process of identifying and categorizing all named entities in a document into predefined classes like person, organization, location, time, and numeral expressions. Amharic Named Entity Recognition (ANER) is proposed based on supervised machine learning, Conditional Random Fields (CRF). Amharic corpus of size 13,538 words have been developed with Stanford tagging scheme. Since the objective of the work is working on feature set of Amharic Named Entities (NE), fifteen experiments were conducted by interchanging different features. Previous and next word, named entity tag of a word, word pairs, word shape, prefix and suffix features have been used to identify three NEs, Person, Organization and Location. The results of the experiment show that proper features combination in NER has a great role. The highest F-measure achieved in this work is 80.66%, with a window size of two on both(left and right)sides, previous and next tag of a current word and prefix and suffix with length four. The worst performance achieved is 61.97%, feature sets are a window size of two from the left side of a word and previous and next word of the current token. Finally the identified optimal feature sets from the experiment are: prefix and suffix with a length of four, previous and next NE tag of a token and window size features
Item Type: | Thesis (Masters) |
---|---|
Subjects: | P Language and Literature > PL Languages and literatures of Eastern Asia, Africa, Oceania Q Science > QA Mathematics > QA75 Electronic computers. Computer science Z Bibliography. Library Science. Information Resources > ZA Information resources |
Divisions: | Africana |
Depositing User: | Selom Ghislain |
Date Deposited: | 20 Sep 2018 12:34 |
Last Modified: | 20 Sep 2018 12:34 |
URI: | http://thesisbank.jhia.ac.ke/id/eprint/5328 |
Actions (login required)
View Item |