Opinion Mining from Amharic Entertainment Texts

Getachew, Abreham (2014) Opinion Mining from Amharic Entertainment Texts. Masters thesis, Addis Ababa University.

[img] PDF (Opinion Mining from Amharic Entertainment Texts)
Abreham, Getachew.pdf - Accepted Version
Restricted to Repository staff only

Download (1MB) | Request a copy

Abstract

The accumulation of vast and unstructured opinions on many domains has been making information acquisition difficult. Opinion mining is the preliminary technique towards tackling this obstacle. Opinion Mining is a process of automatic extraction of knowledge from the opinion of others about some particular topic or problem. We explored opinion mining as a text classification task and employed two simple feature sets (all unigram and the most informative bag-of-words of the review).Information Gain feature selection method used to calculate most informative words from the document and three supervised classifiers implemented from the Natural Language ToolKit (the Naïve Bayes, Decision Tree and Maximum Entropy classifiers). The process of Opinion Mining involves categorizing on opinionated text document into predefined categories such as positive and negative. In this research work, an Opinion Mining model is built for classifying Amharic opinionated text into positive and negative. The experiments are conducted using 616 Amharic opinionated texts collected from Ethiopia Broadcasting Corporation, diretube.com and habesha.com sites. The Experiment indicates that Information Gain feature selection methods perform the best through all algorithms (Naïve Bayes, Decision Tree and Maximum Entropy). Based on their relative performance of classification, NB with 90.9% accuracy outperforms Decision Tree with 83.1% and Maximum Entropy with 89.6%. The result obtained is encouraging. However, negation is not well controlled because of the use of unigram as a feature for classification. Further research needs to consider bi-gram and tri-gram to come up with a better feature set for opinionated Amharic texts.

Item Type: Thesis (Masters)
Subjects: G Geography. Anthropology. Recreation > GN Anthropology
P Language and Literature > P Philology. Linguistics
P Language and Literature > PL Languages and literatures of Eastern Asia, Africa, Oceania
Divisions: Africana
Depositing User: Selom Ghislain
Date Deposited: 18 Jun 2018 12:52
Last Modified: 18 Jun 2018 12:52
URI: http://thesisbank.jhia.ac.ke/id/eprint/4378

Actions (login required)

View Item View Item