Ontology-based Semantic Indexing for Amharic Text in Football Domain

Gesese, Genet Asefa (2013) Ontology-based Semantic Indexing for Amharic Text in Football Domain. Masters thesis, Addis Ababa University.

[img] PDF (Ontology-based Semantic Indexing for Amharic Text in Football Domain)
Genet, Asefa.pdf - Accepted Version
Restricted to Repository staff only

Download (8MB) | Request a copy

Abstract

Enormous amount of data has been produced in electronic format in Amharic language which led to information explosion. This has created a major challenge for information managers in processing information and providing it to users quickly and easily. Therefore, some indexing methods have been proposed for Amharic language by researchers so far. However, these methods are not capable enough to capture the semantics of documents. In this research, an effort has been made to build a semantic indexer for Amharic football news articles by applying domain ontology. The main purpose of the study is to construct an index which is embedded with the ontology so as to minimize query processing time. Ontology development, Document indexing, and Query processing are the core components of the study. Document indexing component is composed of Concept Tagger, Information Extraction, Concept Weighting, and Ontology Population modules. The role of Concept Tagger module is to annotate documents with concepts from the ontology whereas Information Extraction Module is responsible for identifying new individuals and determining the relationship between concepts in the tagged/annotated documents. The Concept Weighting module involves calculating weights for concepts and individuals using the domain ontology. The weights computed for the concepts and individuals are added to the ontology by using the Ontology Population module. The query processing component is built with the purpose of testing the performance of the indexer with user queries. This component has Query Caching, Individual Creator, Document Retrieval, and Document Ranking modules. Query caching is the process of registering original and tagged query pairs in order to avoid running preprocessing and tagging modules whenever the same query is posed by users. Individual Creator module is intended to produce new individuals from queries and adding them to the ontology. Finally, the Document Retrieval and Document Ranking modules are used to retrieve and rank documents according to their level of relevance. Concept reasoning or inferencing is the main task in the document retrieval process. The precision, recall, and F-measure techniques are used to evaluate the performance of the proposed system and the classical IR based on the relevance information provided by experts. The result shows that the proposed semantic indexer has better performance than the lucene indexer used in the classical IR.

Item Type: Thesis (Masters)
Uncontrolled Keywords: Semantic indexing, Football domain ontology, Rule-based information extraction, Semantic information retrieval, Query processor, Concept tagging
Subjects: G Geography. Anthropology. Recreation > GV Recreation Leisure
P Language and Literature > PL Languages and literatures of Eastern Asia, Africa, Oceania
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Divisions: Africana
Depositing User: Selom Ghislain
Date Deposited: 12 Jul 2018 11:56
Last Modified: 12 Jul 2018 11:56
URI: http://thesisbank.jhia.ac.ke/id/eprint/7430

Actions (login required)

View Item View Item