Searching in Amharic Document Image Corpus

Teklu, Abreham Gebretsadik (2010) Searching in Amharic Document Image Corpus. Masters thesis, Addis Ababa University.

[img] PDF (Searching in Amharic Document Image Corpus)
Abreham, Gebretsadik Teklu.pdf - Accepted Version
Restricted to Repository staff only

Download (1MB) | Request a copy

Abstract

The introduction of World Wide Web has made access to digital information easier than ever before. Many information providers have therefore been started to digitize existing paper materials to enable access through networked information service. Nowadays, document retrieval becomes the main issue in information retrieval in order to search for relevant document as per users query. There are two types of document retrieval, text retrieval or document image retrieval. Document image retrieval can be recognition-based or without explicit recognition. There are a number of researches that have been done on document image retrieval throughout the world but there is few research in Amharic document image retrieval. As a result, this study deals with searching from Amharic document image corpus without explicit recognition. This study aims at improving efficiency and effectiveness of the retrieval system from document image collection. To this ends an inverted index file is created to store index terms after removing stopwords and grouping together variant words. Prefix and suffix of word variants are detected by modifying cosine similarity measure. The index file is constructed using inverted file structure. The search result of the system is displayed in ranked order based on TF*IDF weight, and performance evaluation of the system shows a promising result. However there is a need to solve issues related to feature extraction, word variation detection and noise detection and removal. Accordingly, further research works are recommended.

Item Type: Thesis (Masters)
Subjects: P Language and Literature > P Philology. Linguistics
P Language and Literature > PL Languages and literatures of Eastern Asia, Africa, Oceania
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Divisions: Africana
Depositing User: Selom Ghislain
Date Deposited: 18 Jun 2018 12:48
Last Modified: 18 Jun 2018 12:48
URI: http://thesisbank.jhia.ac.ke/id/eprint/4382

Actions (login required)

View Item View Item