Restoration and Retrieval of Historical Amharic Document Images

Mengistu, Biruk (2014) Restoration and Retrieval of Historical Amharic Document Images. Masters thesis, Addis Ababa University.

[img] PDF (Restoration and Retrieval of Historical Amharic Document Images)
Biruk, Mengistu.pdf - Accepted Version
Restricted to Repository staff only

Download (6MB) | Request a copy

Abstract

Many historical document image collections are now being scanned and made available over the Internet or in digital libraries. However, it is to be noted that effective access to such information sources is limited because of lack of efficient retrieval schemes. The existing methods of searching and retrieving from document images can be conducted with the help of recognition-based (Optical Character Recognition) and recognition-free (Document Image Retrieval) or a combination of these two approaches. These algorithms try to analyze the global or local layout structure for different document images and estimate the similarity among them. A few researches have been conducted to develop a recognition-free document image retrieval system that extracts information from document images relying on image features only. These systems are highly affected by degradation in historical documents which results from paper aging, folding or scanning. In this study, an attempt is made to integrate effective image restoring techniques to enhance the effectiveness of the system in searching within historical document images. This study also improves the online searching process of the system by accepting N-query terms for retrieving relevant documents in addition to image viewer, towards enhancing the interface to the Amharic Document Image Retrieval System. In this study different images restoration techniques are experimented, such as Dilate, Erode and Combination of Mathematical Morphology techniques as well as Haar, Daubechies, and Symlet wavelet techniques. These techniques are experimented in historical documents as well as real life documents. Performance analysis shows that best result is obtained by combining mathematical morphology with Otsu thresholding. Finally, the performance of the system is evaluated before and after the integration of the selected restoring techniques in which an average overall performance of 87.02 % F-measure is registered in documents having low, medium and high levels of degradation with an improvement of retrieval effectiveness by 4.65 % F-measure. The performance registered in this study shows promising result for designing applicable Amharic document image retrieval. The major challenge is unavailability of standardized corpus and the dataset contains limited number of historical document images. Therefore, in the future a standardized corpus should be prepared and used for experimentation in similar studies.

Item Type: Thesis (Masters)
Subjects: P Language and Literature > P Philology. Linguistics
P Language and Literature > PL Languages and literatures of Eastern Asia, Africa, Oceania
Z Bibliography. Library Science. Information Resources > Z665 Library Science. Information Science
Divisions: Africana
Depositing User: Selom Ghislain
Date Deposited: 24 Sep 2018 12:43
Last Modified: 24 Sep 2018 12:43
URI: http://thesisbank.jhia.ac.ke/id/eprint/5529

Actions (login required)

View Item View Item