Automatic Summarization for Amharic Text Using Open Text Summarizer

Teklewold, Addis Ashagre (2013) Automatic Summarization for Amharic Text Using Open Text Summarizer. Masters thesis, Addis Ababa University.

[img] PDF (Automatic Summarization for Amharic Text Using Open Text Summarizer)
Addis, Ashagre.pdf - Accepted Version
Restricted to Repository staff only

Download (3MB) | Request a copy

Abstract

Information overload is a problem in this information era due to the mass production of information in many formats which is enhanced by the internet technology. Amharic text documents are part of this mass production. In order to extract the useful information from a given text document with in short time, automatic text summarization plays a decisive role. There are quite a few researches done for Amharic text summarization but still more research needs to be done to accomplish better result achieved in other languages like English. The objective of this study is therefore to investigate the applicability of the open text summarizer for Amharic news text summarization. The system is an open source, language independent single document text summarization tool. It uses combinations of term frequency and sentence position methods to rank the sentences of the article. 40 news articles on different issues are gathered from EPA, WIC and RANP web pages from which a corpus containing 30 news articles is prepared for the experimentation. Some modifications were made on the interface of the tool that was designed in C# programming language. The OTS tool is customized in two ways for performing the two experiments. The first one is done without changing the code of the tool significantly, but with few modifications on the punctuation rules and by preparing the dictionary file that holds the Amharic language lexicons. The system uses language specific lexicons which include list of affixes, abbreviations, stop words, synonyms, compound words and other rules. The second one is done by changing the Porter stemmer of the tool with an Amharic stemmer. The experiment is done on both systems by generating 90 summaries for each news article at 10%, 20% and 30% extraction rates. The performance of the two systems is evaluated using subjective and objective evaluation. Subjective evaluation is done for 45 summaries extracted in experiment one and good result is obtained. Objective evaluation is done for all the summaries generated in both experiments by comparing them with an ideal manual summary using F-measure. The highest score for the first experiment is 75.65% at the 30% extraction rate for middle size articles and a corpus average score of 66.23% has been achieved whereas for experiment two it is 72.83% at the extraction rate of 30% for the large size news articles and a corpus average score of 72.37%. The system with Amharic stemmer gave better performance than the other regardless of the size of the original article in a given extraction rate with better average corpus score at 20% and 30%. The system also showed regularity in performance improvement as the extraction rate increases.

Item Type: Thesis (Masters)
Subjects: H Social Sciences > H Social Sciences (General)
H Social Sciences > HE Transportation and Communications
P Language and Literature > P Philology. Linguistics
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Divisions: Africana
Depositing User: Selom Ghislain
Date Deposited: 18 Jun 2018 12:03
Last Modified: 18 Jun 2018 12:03
URI: http://thesisbank.jhia.ac.ke/id/eprint/4436

Actions (login required)

View Item View Item