Design and Development of Amharic Grammar Checker

Gebru, Aynadis Temesgen (2013) Design and Development of Amharic Grammar Checker. Masters thesis, Addis Ababa University.

[img] PDF (Design and Development of Amharic Grammar Checker)
Aynadis, Temesgen.pdf - Accepted Version
Restricted to Repository staff only

Download (1MB) | Request a copy

Abstract

Most human knowledge is recorded in natural language. The records are kept in computers or on paper to be manipulated and reserved for use in the future. Natural Language processing plays an important role in increasing computers capability to understand natural languages. Designing and implementing computer programs that can understand natural language is the aim of the works in the area of Natural Language Processing. In order to communicate through natural languages grammatical correctness is very crucial. Therefore, natural language processing applications should be enabled to recognize the grammatical errors of natural language texts. This process is known as grammar checking. This work introduces development and design of Amharic grammar checker. Two grammar checker approaches have been used in this research. The first approach is a rulebased and it is tested for simple sentences. The rules are constructed manually and matched against the patterns of the sentence to be checked. The second approach is statistical approach and tested for both simple and complex sentences. In the statistical Amharic grammar checker, ngram and probabilistic methods are used to check grammatical errors of Amharic sentence. The patterns and the corresponding probabilities of occurrence are automatically extracted from the training corpus and stored in a repository. Sentence probability can be calculated using these patterns and probabilities. Then, probability of the sentence and specified threshold are used to determine the correctness of the sentence. The corpus, both for training and test set, is prepared from a manually part-of-speech text of the language. The evaluation is made in two test cases. The first case is done on simple sentences. In this test case, 92.45% precision and 94.03% recall is obtained for the rule-based Amharic grammar checker. On the same test case, the statistical Amharic grammar checker (trigram) shows precision and recall of 67.14% and 90.38% respectively. The statistical Amharic grammar checker is tested using complex sentences in the second test case. In this test case, 63.76% of the errors are detected. The evaluation result shows that each approach is capable of detecting multiple errors from a sentence. The false alarms are due to the incomplete grammatical rules and quality of the statistical data. The accuracy of morphological analyzer also affects the grammar checking result in both approaches.

Item Type: Thesis (Masters)
Uncontrolled Keywords: Statistical grammar checker, rule-based grammar checker, n-gram, POS tag sequence
Subjects: P Language and Literature > PL Languages and literatures of Eastern Asia, Africa, Oceania
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Divisions: Africana
Depositing User: Selom Ghislain
Date Deposited: 07 Sep 2018 13:16
Last Modified: 07 Sep 2018 13:16
URI: http://thesisbank.jhia.ac.ke/id/eprint/5066

Actions (login required)

View Item View Item