Afaan Oromo – Amharic Cross Lingual Information Retrieval: A corpus Based Approach

Alemu, Eyob Nigussie (2013) Afaan Oromo – Amharic Cross Lingual Information Retrieval: A corpus Based Approach. Masters thesis, Addis Ababa University.

[img] PDF (Afaan Oromo – Amharic Cross Lingual Information Retrieval: A corpus Based Approach)
Eyob, Nigussiealemu.pdf - Accepted Version
Restricted to Repository staff only

Download (2MB) | Request a copy

Abstract

Ethiopia is a multi lingual country with over 80 distinct languages, and with a population size of more than 73.9 million as authorities estimated on the basis of the 2007 census (Bloor, 1995). In multilingual countries like Ethiopia it‟s not uncommon to see language barriers while seeking information in language other than ones mother tongue. Afaan Oromo (also known as „Oromiffa‟) is one of the languages that are widely used and spoken in Ethiopia by the Oromo people which account up to 36.7% of the total population (Commission, 2008). Currently Afaan Oromo is an official language of Oromia regional state. On the other hand, the current official language of Federal Democratic Republic of Ethiopia is Amharic. However, there are people who are not fluent enough to create Amharic query terms but need Amharic documents for different reasons. An IR system capable of breaking language barrier in retrieval of information would clearly be helpful for such a user. This study is therefore aimed at designing and developing a corpus based Afaan Oromo–Amharic cross lingual information retrieval system so as to enable Afaan Oromo speakers to retrieve Amharic information using Afaan Oromo queries. The approach selected to be followed in the study is corpus based, particularly parallel corpus. For this study parallel documents including news articles, bible, legal documents and proclamations from customs authority were used. The system is tested with 50 queries and 50 randomly selected documents. Two experiments were conducted, the first one by allowing only one possible translation to each Afaan Oromo query term and the second by allowing all possible translations. The retrieval effectiveness of the system is measured using recall and precision for both monolingual and bilingual runs. Accordingly, the first experiment returned a maximum average precision of 0.81 and 0.45 for monolingual (Afaan Oromo queries) and bilingual (translated Amharic queries) run. The result of the second experiment showed better result of recall and precision than the first experiment. The result obtained in the second experiment is a maximum average precision of 0.60 for the bilingual run and the result for the monolingual run remained the same. From these results, it can be concluded that, cross lingual information retrieval for two local languages namely Afaan Oromo and Amharic could be developed and the performance of the retrieval system could be increased with use of larger and clean corpora.

Item Type: Thesis (Masters)
Uncontrolled Keywords: Afaan Oromo-Amharic Cross-Lingual Information Retrieval, Information Retrieval, Afaan Oromo, Amharic.
Subjects: P Language and Literature > PL Languages and literatures of Eastern Asia, Africa, Oceania
Z Bibliography. Library Science. Information Resources > Z665 Library Science. Information Science
Divisions: Africana
Depositing User: Selom Ghislain
Date Deposited: 30 Oct 2018 12:46
Last Modified: 30 Oct 2018 12:46
URI: http://thesisbank.jhia.ac.ke/id/eprint/7152

Actions (login required)

View Item View Item