A Hybrid Approach to Amharic Base Phrase Chunking and Parsing

Abeba, Ibrahim (2013) A Hybrid Approach to Amharic Base Phrase Chunking and Parsing. Masters thesis, Addis Ababa University.

[img] PDF (A Hybrid Approach to Amharic Base Phrase Chunking and Parsing)
Abeba, Ibrahim.pdf - Accepted Version
Restricted to Repository staff only

Download (1MB) | Request a copy

Abstract

Nowadays, Natural Language Processing (NLP) concerns with the interaction between computers and human natural languages. The most difficult task in NLP is to learn natural languages for the computer. Enabling computers to understand natural language involves assigning of words with their Part Of Speech, extraction of phrases, extraction of meaning, etc from natural language sentences. Text chunking and sentence parsing are among the tasks of NLP. Text chunking or shallow parsing is one of the tasks of NLP which divides a text in syntactically correlated words from a stream of text. It is an intermediate step of full parsing. As well as, text chunking could be used as a precursor for many natural language processing tasks, such as information retrieval, named entity extraction, text summarization and so on. The objective of this research is to extract different types of Amharic phrases by grouping syntactically correlated words which are found at different level of the parser using Hidden Markov Model (HMM) model and to transform the chunker to parser. Some rules are also used in this study to correct some outputs of HMM based chunker. Bottom-up approach with transformation algorithm is used to transform the chunker to the parser. For the identification of the boundary of the phrases IOB2 chunk specification is selected and used in this study. In this study different sentences are collected from Amharic grammar books and news of Walta Information Center (WIC) for the training and testing datasets. Unlike the data collected from WIC, the data collected from Amharic grammar books are not tagged at all. Thus, these data sets were analyzed and tagged manually and used as a corpus for chunking. But the entire data sets were chunk tagged manually for the training data set and approved by linguistic professionals. Experiments have been conducting using the training and testing data sets. The training and testing datasets are prepared using the 10 fold cross validation. The experiments on Amharic sentence chunking showed an average accuracy of 85.31% testing set before applying the rule for correction and an average accuracy of 93.75% on the test set after applying rules. And also the experiment on Amharic sentence parsing showed an average accuracy of 93.75%.

Item Type: Thesis (Masters)
Uncontrolled Keywords: Amharic Text chunking, Amharic partial parsing, Amharic shallow parsing, Amharic Parsing
Subjects: P Language and Literature > P Philology. Linguistics
P Language and Literature > PL Languages and literatures of Eastern Asia, Africa, Oceania
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Divisions: Africana
Depositing User: Selom Ghislain
Date Deposited: 22 Jun 2018 09:48
Last Modified: 22 Jun 2018 09:48
URI: http://thesisbank.jhia.ac.ke/id/eprint/4220

Actions (login required)

View Item View Item