A Probabilistic Information Retrieval System for Tigrinya

Atalay, Luel (2014) A Probabilistic Information Retrieval System for Tigrinya. Masters thesis, Addis Ababa University.

[img] PDF (A Probabilistic Information Retrieval System for Tigrinya)
Atalay, Luel.pdf - Accepted Version
Restricted to Repository staff only

Download (2MB) | Request a copy

Abstract

Many applications that handle information on the internet or other archive would be completely inadequate without the support of information retrieval technology. Nowadays, a considerable amount of information has been produced in Tigrinya. This accumulation of information is challenging for searching from the existing huge amount of information particularly written in Tigrinya. Thus, developing an IR system for Tigrinya allows searching and retrieving relevant documents that satisfy information need of Tigrinya users. Accordingly a research is conducted for Tigrinya IR system using the probabilistic model which, unlike vector space model, has the mechanism to reweighting query terms using relevance feedback and query reformulation techniques. Additionally, the model does define uncertainty that exists in IR systems. This thesis is a pioneer research on IR for Tigrinya text documents. This research is initiated to experiment the effectiveness of an IR system for Tigrinya using a rule-based Tigrinya stemmer developed by Yonas Fisseha in 2011. Yonas had recommended that researches should be conducted using Tigrinya stemmer on Tigrinya Information Retrieval system to see its impact over recall and precision. In this thesis, the potential of probabilistic model in Tigrinya text retrieval is investigated. 300 Tigrinya documents and 10 queries were used to test the approach. The researcher presents the design and prototype implementation of the probabilistic model of the IR system for Tigrinya documents. Both indexing and searching modules are constructed. Then, the retrieval system is tested and the experimental results show that probabilistic based IR system in Tigrinya documents returned encouraging result. The system registered, after stemming and pseudo relevance feedback, an average precision 69.1%, recalls 90%, and F- measure 74.4%. This result is achieved without controlling the problem of synonyms and polysemous of terms that exist in Tigrinya text. The researcher has recommended that further works on the area need to see the retrieval effectiveness of Tigrinya IR system using 1) hybrid Tigrinya stemmer to mean rule based and dictionary based Tigrinya stemming algorithm or 2) ontology based stemming algorithm that conflates based on meaning understanding (recommended more). There should also a need to build hybrid system that uses vector space model to guess relevant documents for user query using non-binary weighting technique and then use probabilistic relevance feedback to improve the performance of the system and to solve the problem of the initial guess of probabilistic model based on Boolean expression.

Item Type: Thesis (Masters)
Subjects: H Social Sciences > HA Statistics
Z Bibliography. Library Science. Information Resources > Z665 Library Science. Information Science
Z Bibliography. Library Science. Information Resources > ZA Information resources
Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4050 Electronic information resources
Divisions: Africana
Depositing User: Selom Ghislain
Date Deposited: 06 Sep 2018 13:28
Last Modified: 06 Sep 2018 13:28
URI: http://thesisbank.jhia.ac.ke/id/eprint/4969

Actions (login required)

View Item View Item