A Semi- Supervised Approach for Amharic News Classification

Asres, Animut Belay (2012) A Semi- Supervised Approach for Amharic News Classification. Masters thesis, Addis Ababa University.

[img] PDF (A Semi- Supervised Approach for Amharic News Classification)
Animut, Belay.pdf - Accepted Version
Restricted to Repository staff only

Download (1MB) | Request a copy

Abstract

Text classification is getting more attention and there is an increasing need for text classification technique that provides automatic, fast, and accurate classification with the least human interaction with such systems. Many techniques of supervised learning and unsupervised learning do exist in the literature for data classification. Semi-supervised learning is halfway between the supervised and unsupervised learning. In addition to unlabeled data, the algorithm is provided with some supervision information but not necessarily for all example data. The paper explored the semi-supervised text classification which is applied to different types of vectors that are generated from the Amharic text documents. 3,154 news articles were used to do this research. To come up with good results document preparation and preprocessing was done. Weka package is used for the classification of the preprocessed data. Machine learning techniques, Expectation maximization clustering algorithm with Naïve Bayes, Hyperpipe, and RBF Network classification algorithm were used to categorize the Amharic news items. The accuracy of the classifiers was better when the number of classes is less. The best result was obtained by the Naïve Bayes , Hyperpipe and RBF Networks classifiers with four classes (83.44 %, 82.8 and 82.4%) and the least performance is shown on the 10 categories (55.42%,57.26% and 51.9%) respectively. This research indicated that Naïve Bayes is more applicable to semisupervised categorization of Amharic news items. Keywords: Text categorization, semi-supervised machine Learning, Naïve Bayes, Hyperpipe and RBF Networks

Item Type: Thesis (Masters)
Subjects: P Language and Literature > P Philology. Linguistics
P Language and Literature > PL Languages and literatures of Eastern Asia, Africa, Oceania
Z Bibliography. Library Science. Information Resources > Z665 Library Science. Information Science
Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4050 Electronic information resources
Divisions: Africana
Depositing User: Selom Ghislain
Date Deposited: 16 Aug 2018 08:57
Last Modified: 16 Aug 2018 08:57
URI: http://thesisbank.jhia.ac.ke/id/eprint/4788

Actions (login required)

View Item View Item