Syllable-Based Amharic Speech Synthesis (TTS) Using HMM

Bahiru, Demessie Dubie (2017) Syllable-Based Amharic Speech Synthesis (TTS) Using HMM. Masters thesis, Addis Ababa University.

[img] PDF (Syllable-Based Amharic Speech Synthesis (TTS) Using HMM)
Bahiru, Demessie Dubie.pdf - Accepted Version
Restricted to Repository staff only

Download (2MB) | Request a copy


Speech Synthesis systems have been developed gradually over the last few decades and it has been integrated into several new applications. There is still much work and improvements to be done in prosodic, text preprocessing, and pronunciation fields to achieve more natural sounding speech. In this thesis work, ASR corpus is used to develop a syllable based speech synthesis system for Amharic language using Hidden Markov Model. The datasets are randomly selected from ASR corpus with six female speakers’ corpora as training data. Both text and speech with the size of each 600 were used. These corpus were split in to two, 90% for training and 10% testing data sets. Components of Hidden Markov Model and Amharic language features are studied. Though, every feature of the Amharic language was not considered since it needs a lot of time and deep linguistic knowledge. The utterance structure generated by festival and festvox together with the parameters extracted from the raw wave data were used for training the model. Formerly, the speech parameter sequence, which is generated based on the predicted models, is used to synthesis the speech waveform by a vocoder. In this research work the text that is going to be synthesized was assumed to be transcribed. Lastly, the synthesized speech is generated from the trained model based on the labeled input text. Evaluation is done in two ways. First, based on the researcher evaluation, the systems register on the overall performance 75.56% for syllable based and 77.78% for phone based system; Preference evaluation result shows that Syllable based synthesis performs better in naturalness than intelligibility while Phone based TTS performs better in intelligibility with 550 sentences’ training data. Second, the average MOS evaluation of the system from eight listeners for the five Amharic sentences is found to be 2.94 and 3.02 for phone based and syllable based, respectively. It shows that, Syllable based TTS system outperforms the system that uses phone as basic unit. According to the MOS results, the synthesis system is categorized as good in terms of both intelligibility and naturalness. The result looks encouraging and further improvement depends on proper works in different context such as phoneme coverage, lexicon, and question set.

Item Type: Thesis (Masters)
Subjects: P Language and Literature > PL Languages and literatures of Eastern Asia, Africa, Oceania
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Z Bibliography. Library Science. Information Resources > ZA Information resources
Divisions: Africana
Depositing User: Selom Ghislain
Date Deposited: 11 Sep 2018 09:27
Last Modified: 11 Sep 2018 09:27

Actions (login required)

View Item View Item