Spontaneous Speech Recognition for Amharic Using HMM

Deksiso, Adugna (2015) Spontaneous Speech Recognition for Amharic Using HMM. Masters thesis, Addis Ababa University.

[img] PDF (Spontaneous Speech Recognition for Amharic Using HMM)
Adugna, Deksiso.pdf - Accepted Version
Restricted to Repository staff only

Download (1MB) | Request a copy

Abstract

The ultimate goal of automatic speech recognition is towards developing a model that automatically converts speech utterance into a sequence of words. Having similar objective of transforming Amharic speech in to its equivalent sequence of words, this study explored the possibility of developing Amharic spontaneous speech recognition system using hidden Markov model (HMM). A spontaneous, speaker independent Amharic speech recognizer developed in this research work was done using conversational speeches between two or more speakers. This speech data are collected from web and transcribed manually. Among the collected data for training 2007 sentences uttered by 36 peoples from different age group and sex is used. This training data consists of 9460 unique words and it is around 3 hours and 10 minutes speech. For testing, 820 unique words which are from 104 utterances (sentences) uttered by 14 speakers are used. The collected conversational speech data contains different non-speech events both from speaker and from environment which causes the decrement of speech recognizer performance. Depending on these non-speech events frequencies, two data sets are prepared, the first data set prepared by including less frequent non-speech events in models and the second data set prepared by excluding them. Using the data sets, the acoustic model developed using word internal and cross word tied state tri-phones up to 11th Gaussian mixture. For this research, relatively the best recognizer performance is found to be 41.60% word accuracy for speakers involved in training, 39.86% for test data from both speakers which are involved and not involved in training and 23.25% for speakers those do not involved in training. The recognizer developed using cross-word tri-phone shows less performance than word internal tri-phone due to smallness of our data size. The recognizer developed and tested using the data which includes less frequent non-speech events showed less word accuracy than the one that include them. According to the finding of this research, the performance gained for Amharic spontaneous speech recognizer is less in accuracy. This is due to the nature of speech and the smallness of the size of data used; therefore, this result can be optimized by increasing the size of the data.

Item Type: Thesis (Masters)
Subjects: P Language and Literature > P Philology. Linguistics
P Language and Literature > PL Languages and literatures of Eastern Asia, Africa, Oceania
Q Science > QA Mathematics > QA76 Computer software
Divisions: Africana
Depositing User: Selom Ghislain
Date Deposited: 25 Jun 2018 11:40
Last Modified: 25 Jun 2018 11:40
URI: http://thesisbank.jhia.ac.ke/id/eprint/4518

Actions (login required)

View Item View Item