Automatic Classification of AFAAN Oromo News Text: The Case of Radio Fana

Abera, Diriba Gemechu (2009) Automatic Classification of AFAAN Oromo News Text: The Case of Radio Fana. Masters thesis, Addis Ababa University.

[img] PDF (Automatic Classification of AFAAN Oromo News Text: The Case of Radio Fana)
ABERA, DIRIBA.pdf - Accepted Version
Restricted to Repository staff only

Download (617kB) | Request a copy

Abstract

The vast growth of information and communication technology resulted in a huge volume of information very large bulk of which is stored as unstructured text. The presence of so much text in electronic form is a challenge to natural language processing. As the volume of electronic information increases, there is growing interest in developing tools to help people better find, filter, and manage these resources. Arguably, the only way for humans to cope with the information explosion is to exploit computational techniques that can sift through huge bodies of text. Currently news agencies in Ethiopia in which large amount of news from all the available sources are processed every day is implementing a manual classification system to categorize news items in their daily activities despite the fact, they are using computerized system to store and edit news items. Radio Fana is the one among these agencies. The objective of this research is to develop and adopt processing tools for Afaan Oromo text classification and investigate the application of machine learning techniques for automatic classification of Afaan Oromo news items. The data source for this research is the Afaan Oromo news items obtained from Radio Fana Share Company. In this research, tools for pre-processing Afaan Oromo news items such as tokenization, removal of extraneous characters, removal of stop-words and removal of affixes from the words are prepared to facilitate the experimentation process for the automatic classifiers. Among the automatic classifiers which are applicable on high dimensional data, four of them; Sequential Minimal Optimization (SMO) algorithm from Support Vector Machines, NaiveBayesMultiNominal (NBM) from Bayesian Classifiers, J48 algorithm from the Decision trees and K-Nearest Neighbor (KNN) from the Lazy Learners have been experimented on the final data. The data, the pre-processed Afaan Oromo news items, is organized in to categories of four classes, seven classes and all (eleven) classes for the experimentation purpose and the experimentation uses 10-fold stratified cross validation for training and test data.

Item Type: Thesis (Masters)
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
T Technology > T Technology (General)
Divisions: Africana
Depositing User: Selom Ghislain
Date Deposited: 14 Jun 2018 09:58
Last Modified: 14 Jun 2018 09:58
URI: http://thesisbank.jhia.ac.ke/id/eprint/4292

Actions (login required)

View Item View Item