A Generic Approach towards all Words Amharic Word Sense Disambiguation

Dureti, Siraj Bekeli (2017) A Generic Approach towards all Words Amharic Word Sense Disambiguation. Masters thesis, Addis Ababa University.

[img] PDF (A Generic Approach towards All Words Amharic Word Sense Disambiguation)
Dureti, Siraj Bekeli.pdf - Accepted Version
Restricted to Repository staff only

Download (2MB) | Request a copy

Abstract

Sense disambiguation is an “intermediate task” which is helpful in other NLP tasks like machine translation, information retrieval and hypertext navigation, content and thematic analysis, grammatical analysis, speech processing and text processing. This study attempts to explore a more general approach to develop a WSD for Amharic language. To this end, a WSD system that identifies a sense of an Amharic ambiguous word by using information from tagged example sentences and Word-Net is developed. The system identifies the sense by measuring similarity between the input sentence and tagged example sentences. Two similarity measures are explored: Cosine similarity and Jaccard Coefficient similarity measure. We have collected 100 example sentences for each sense of the selected Amharic ambiguous words. The Word-Net is composed of words with their sysnonyms and gloss definition. The performance of the system is tested using 9 nouns, 3 verbs, 3 adjectives and 2 adverbs, a total 17 words which are selected randomly. The experiments were done for disambiguating one target word in a given text.The experimental step is designed in such a way that, first the performance of Cosine similarity and Jaccard coefficient are checked individually for WSD, next Lesk algorithm is tested on the third experiment and then experiments were conducted to check the performance of the two similarity measures as combined with Lesk algorithm. The result showed that Jaccard coefficient combined with Lesk algorithm come up with the highest result, which is 89.83% accuracy. The major challenge during the disambiguation process is that for those words that are frequently collocated with similar words in their different senses the system come up with a least accuracy.

Item Type: Thesis (Masters)
Subjects: P Language and Literature > PL Languages and literatures of Eastern Asia, Africa, Oceania
Z Bibliography. Library Science. Information Resources > Z665 Library Science. Information Science
Divisions: Africana
Depositing User: Selom Ghislain
Date Deposited: 27 Jun 2018 11:52
Last Modified: 27 Jun 2018 11:52
URI: http://thesisbank.jhia.ac.ke/id/eprint/6040

Actions (login required)

View Item View Item