Phoneme Level Automatic Speech Segmentation for Amharic Language Using HMM Approach

Emiru, Eshete Derb (2011) Phoneme Level Automatic Speech Segmentation for Amharic Language Using HMM Approach. Masters thesis, Addis Ababa University.

[img] PDF (Phoneme Level Automatic Speech Segmentation for Amharic Language Using Hmm Approach)
Eshete, Derb.pdf - Accepted Version
Restricted to Repository staff only

Download (1MB) | Request a copy

Abstract

Speech segmentation is the process of identifying the boundaries between meaningful units like phonemes in a continuous speech. A need exists for reliable, automatic determination of phonemes boundaries in different speech research areas such as ASR to improve the performance of the recognizer, to improve the quality of speech synthesis system through segmented database, and to improve performance of language identification and speaker verification system. In this study unsupervised method of automatic speech segmentation is proposed as a solution. Text corpus with size of 1000 Amharic sentences was collected from political news, economy news, sport news, health news, fictions, Bible, penal code and Federal Negarit Gazzeta. These Amharic texts are recorded by one female and one male speaker in order to have parallel speech corpus. Both text and speech corpuses are split into training (90%) and test (10%) data sets. Phoneme based speaker dependent Hidden Markov Model is preferred, being the one that is most widely used, and also due to the availability of the HTK software suite. HMM approach is used to model Amharic phonemes in individual HMM with 3 emitting and 2 non emitting states without skipping left to right HMM. MFCC feature vectors together with their first and second derivatives are selected for individual HMM models. Letter and phoneme were used as a basic unit to model the HMM in context independent, context dependent with single Gaussian mixture and context dependent with Multiple Gaussian mixtures. The system is also evaluated in terms of percentage of boundary deviations with 5ms, 10ms, 15ms and 20ms tolerance values with reference to manual segmentation results. The evaluation of the experiments shows that best performance with minimum percentage of time boundary deviations are achieved using phoneme based approach in context dependent environment with two Gaussian mixture and four Gaussian mixture for male speaker and female speaker respectively.

Item Type: Thesis (Masters)
Subjects: P Language and Literature > PL Languages and literatures of Eastern Asia, Africa, Oceania
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Divisions: Africana
Depositing User: Selom Ghislain
Date Deposited: 29 Jun 2018 12:22
Last Modified: 29 Jun 2018 12:22
URI: http://thesisbank.jhia.ac.ke/id/eprint/6265

Actions (login required)

View Item View Item