An Automatic Sentence Parser for Oromo Language Using Supervised Learning Technique

Diriba, Megersa (2002) An Automatic Sentence Parser for Oromo Language Using Supervised Learning Technique. Masters thesis, Addis Ababa University.

[img] PDF (An Automatic Sentence Parser for Oromo Language Using Supervised Learning Technique)
DIRIBA, MEGERSA.pdf - Accepted Version
Restricted to Repository staff only

Download (613kB) | Request a copy

Abstract

The goal of Information Retrieval has been to reduce human language complexities and as a result serve users in the most efficient way. The decisive tool in achieving such end is the Natural language Processing (NLP). NLP has many components in serving such purpose. Parsing is one of such components in NLP in improving precision and recall which is the goal of Information Retrieval Systems. Moreover, parsing is also used in the effort towards machine translation which is one of the heart of Natural Language Processing. Today, different kinds of parsers have been developed for languages, which have relatively wider use nationally and/or internationally since the 1960s. Unfortunately Oromo has not captured the advantage of such system being the working language of the State Government of Oromiya, and one of the major languages in Ethiopia and Africa (Abebe 2002) for there are no systems (parsers of any sort) that parse written texts in this language. This study is, therefore, an attempt to develop a simple automatic sentence parser for Oromo language. In the study, the chart algorithm was used with some modification. A module for morphological analyzer, which splits words into root form and their corresponding morpheme, was also developed in order to facilitate the preparation of texts in a file to be parsed with appropriate lexical categories. In addition, the unsupervised learning algorithm was designed to guide the parser in predicting unknown and ambiguous words in a sentence. Grammar rules, lexicon, morphological rules and contextual information were also designed on the basis of the review made on the linguistic properties of Oromo grammatical categories. This system, in fact, is the first in its kind for this language.The study adopts an intelligent (Rule-Based+ learning module) approach to develop a prototype, which is a simple Oromo parser for the language. The thesis, in short, describes processes of automated sentence parsing of Free Texts. That is, it is aimed at developing a prototype and conducting an experiment with it. The result obtained (95% on the training test and 88.5% on the test set) using the small manually parsed sentences encourage further research to be launched, especially with the aim of developing a full-fledged Oromo sentence parser.

Item Type: Thesis (Masters)
Subjects: P Language and Literature > PL Languages and literatures of Eastern Asia, Africa, Oceania
Q Science > QA Mathematics > QA76 Computer software
Z Bibliography. Library Science. Information Resources > Z665 Library Science. Information Science
Divisions: Africana
Depositing User: Selom Ghislain
Date Deposited: 27 Jun 2018 13:42
Last Modified: 27 Jun 2018 13:42
URI: http://thesisbank.jhia.ac.ke/id/eprint/6021

Actions (login required)

View Item View Item