Probabilistic Information Retrieval System for Amharic Language

Amanuel, Hirpa Madessa (2012) Probabilistic Information Retrieval System for Amharic Language. Masters thesis, Addis Ababa University.

[img] PDF (Probabilistic Information Retrieval System for Amharic Language)
Amanuel, Hirpa Madessa final.pdf - Accepted Version
Restricted to Repository staff only

Download (1MB) | Request a copy

Abstract

Nowadays, a considerable amount of information has been produced in Ethiopia. This accumulation of information is challenging for archival and searching from the existing huge amount of information particularly written in Amharic language. Thus, developing an information retrieval (IR) system for Amharic language allows searching and retrieving relevant documents that satisfy information need of users. Accordingly, few IR systems have been developed. However, those IR systems have not registered a promising performance because they are developed based on vector space model that do not have the mechanism to define user’s information need using relevance feedback and query reformulation techniques unless other modules are integrated. Furthermore, the model does not define uncertainty that exists in IR systems. In order to solve these issues, probabilistic retrieval model that has the capability of reweighting query terms based on relevance feedback can be used. In this research, a probabilistic based IR system is developed for Amharic language. Both indexing and searching module was constructed. In these modules, different text operations such as: tokenization, normalization, stemming and stop word removal are included. Then, the retrieval system is tested and the experimental results show that probabilistic based IR system returned encouraging result even without controlling the problem of synonyms and polysemous terms that exist in Amharic text. The system registered on the average 73% F-measure. Nevertheless, the performance of the system is greatly affected by synonyms and polysemous terms that exist in the language beside its richness in morphology (variant words).

Item Type: Thesis (Masters)
Uncontrolled Keywords: Information Retrieval, Probabilistic Model, Amharic Language.
Subjects: P Language and Literature > P Philology. Linguistics
Q Science > QA Mathematics
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Africana
Depositing User: Selom Ghislain
Date Deposited: 19 Jun 2018 14:26
Last Modified: 19 Jun 2018 14:26
URI: http://thesisbank.jhia.ac.ke/id/eprint/4698

Actions (login required)

View Item View Item