Kabeta, Asefa Bayisa (2015) Query-based Automatic Summarizer for Afaan Oromo Text. Masters thesis, Addis Ababa University.
PDF (Query-based Automatic Summarizer for Afaan Oromo Text)
Asefa, Bayisa Kabeta.pdf - Accepted Version Restricted to Repository staff only Download (1MB) | Request a copy |
Abstract
Text summarization is the most challenging task in information retrieval. It is an outcome of electronic document explosion and can be seen as the condensation of the document collection. Automatic text summarization can be generic or query specific. In query-focused or queryoriented summarization a query is provided to a summarizer in addition to the source documents. The summarizer is supposed to construct a summary that contains information requested by the query. A document retrieval system together with a query-oriented summarization system is potentially a very powerful combination, which might be much more effective than a document retrieval system alone. Thus, this thesis focused on the possibility of developing query-based, single document, extractive summarization system. In this thesis, two methods that create text summaries by extracting and ranking sentences from the original documents are proposed. The first method is based on the most commonly used IR model called vector space model (VSM) for finding the most important sentences related to the query given by the user. The second method is the position method which is used in attempting to improve the quality of the summary along with VSM. The sentence ranking algorithm performs based on the sentence score to rank sentences in the order of their importance and finally summary is produced by selecting the top N sentences, where the value of N is set by the user. Experiments were conducted using40 Afaan Oromo news contained in the corpus. Three language experts of Oromia Culture and Tourism Bureau, language department were employed to conduct manual summarization which serves as the ideal summary. Intrinsic evaluation technique is used for evolution purpose. It involves both objective and subjective evaluation. The objective evaluation evaluate the performance of the system using standard Information Retrieval (IR) evaluation metrics (Precision, Recall and F-measure) and the subjective evaluation evaluate the linguistic quality such as informativeness and coherence using the scores on five scale measures by human evaluators. The results of the evaluations showed that the proposed system registered f-measure of 82%, 78% and 82% at summary extraction rate of 10%, 20%, and 30% respectively when VSM is used along with position method. Moreover, the informativeness and coherence of the proposed system also registered its best performance summary of 59%, 77% and 91% average score on five scale measures at extraction rate of 10%, 20%, and 30% respectively when both methods used together. The challenging task in the study includes lack of query expansion tools which help to obtain more clues in finding important sentences in the final summary and some final summaries contain unresolved references that may cause difficulties in understanding. These will be the future research directions in this area which contribute in the improvement of the proposed system.
Item Type: | Thesis (Masters) |
---|---|
Subjects: | P Language and Literature > P Philology. Linguistics P Language and Literature > PL Languages and literatures of Eastern Asia, Africa, Oceania Z Bibliography. Library Science. Information Resources > Z665 Library Science. Information Science Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4050 Electronic information resources |
Divisions: | Africana |
Depositing User: | Selom Ghislain |
Date Deposited: | 24 Aug 2018 12:51 |
Last Modified: | 24 Aug 2018 12:51 |
URI: | http://thesisbank.jhia.ac.ke/id/eprint/4858 |
Actions (login required)
View Item |