Automatic Thesaurus Construction from Wolaytta Text

Demewoz, Beldados (2013) Automatic Thesaurus Construction from Wolaytta Text. Masters thesis, Addis Ababa University.

[img] PDF (Automatic Thesaurus Construction from Wolaytta Text)
Demewoz, Beldados final.pdf - Accepted Version
Restricted to Repository staff only

Download (700kB) | Request a copy

Abstract

Thesaurus is a set of terms for documents classification during indexing and query expansion during the process of searching with the aim of enhancing retrieval effectiveness. The major problem associated with information retrieval system: in one hand, users are required to explicitly describe their information need to the system, on the other the system itself often retrieve irrelevant documents due to vocabulary mismatch between query term and index term. As information retrieval system compares query term and index term at a lexical level, the mismatch is so pronounced to affect the retrieval performance. Therefore thesaurus a means to the problem by providing precise and controlled vocabulary of terms for indexing and searching there by resolve vocabulary mismatch. Wolaytta is an official language of literacy in Ethiopia. Since the introduction of the Latin script in the writing system in 1993, the language has evolved significantly from mere verbal communication to means of instruction then to source of information. To use the language as source of information, the retrieval system should be designed with enhanced capability in resolving what so ever mismatches that arise between query term and index term. This research thesis develops an automatic association thesaurus from Wolaytta text for possible inception of enhanced retrieval system or to provide a frame work for the development of crosslanguage retrieval system. The developed system is based on term-to-term co-occurrence based automatically constructed association thesaurus from document corpora. In order to obtain a reasonably good performance the system incorporated manual approaches regarding stop words and suffix list compilation processes and achieved a better result in generating related concepts.

Item Type: Thesis (Masters)
Subjects: P Language and Literature > PL Languages and literatures of Eastern Asia, Africa, Oceania
Z Bibliography. Library Science. Information Resources > Z665 Library Science. Information Science
Z Bibliography. Library Science. Information Resources > Z719 Libraries (General)
Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4050 Electronic information resources
Divisions: Africana
Depositing User: Selom Ghislain
Date Deposited: 01 Oct 2018 11:27
Last Modified: 01 Oct 2018 11:27
URI: http://thesisbank.jhia.ac.ke/id/eprint/5855

Actions (login required)

View Item View Item