Concatenative Speech Synthesis for Amharic Using Unit Selection Method

Bayou, Eyob (2011) Concatenative Speech Synthesis for Amharic Using Unit Selection Method. Masters thesis, Addis Ababa University.

[img] PDF (Concatenative Speech Synthesis for Amharic Using Unit Selection Method)
Eyob, Bayou.pdf - Accepted Version
Restricted to Repository staff only

Download (1MB) | Request a copy

Abstract

Speech synthesis takes text as input and generates acoustic signal as output. In the process, the input text is preprocessed to tokenize it into words or other meaningful tokens and to transliterate numbers, abbreviations and acronyms. Text-analysis follows text preprocessing to identify grammatical structures and context. Once the text analysis phase is completed the next step is to convert graphical representation of sounds to their phonetic representation. A phoneme usually has multiple phones that are used in different contexts. Amharic language’s orthography is phonemical in the sense that a grapheme represents exactly one phoneme. However, this statement is true as long as epenthesis and geminations are not considered. The language’s orthography does not also show suprasegmental information that is required to properly model speaking styles. Even though converting grapheme to phoneme is easy in Amharic, converting phoneme to phone is very difficult because of the two necessary and yet orthographically unrepresented components of the language – epenthesis and gemination. Modeling prosodic features of various speaking styles is also the other challenging task in developing Amharic TTS. This is challenging because, in one hand, the task of modeling human speech is very challenging in itself and in the other hand, research works done for Amharic language are relatively few. This project work has tried to address epenthesis and gemination, which are phonologically very important features of the language, by studying and implementing techniques found in various literatures. Making use of orthographic property of verbs in their perfect form, this work introduces rules that can be used to locate phones that need to be stressed. The grapheme to phoneme conversion algorithm also addresses epenthesis. Prosodic differences of declarative and interrogative utterances are represented by making use of unique sentence-final phones recorded and segmented for this purpose. Transliteration of numerals and abbreviations is also addressed in the text preprocessing phase of the system. The results found after being evaluated by ten fluent speakers of the language are encouraging.

Item Type: Thesis (Masters)
Subjects: P Language and Literature > PL Languages and literatures of Eastern Asia, Africa, Oceania
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Divisions: Africana
Depositing User: Selom Ghislain
Date Deposited: 04 Oct 2018 12:24
Last Modified: 04 Oct 2018 12:24
URI: http://thesisbank.jhia.ac.ke/id/eprint/6723

Actions (login required)

View Item View Item