10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Speech Synthesis Without a Phone Inventory

Matthew P. Aylett, Simon King, Junichi Yamagishi

University of Edinburgh, UK

In speech synthesis the unit inventory is decided using phonological and phonetic expertise. This process is resource intensive and potentially sub-optimal. In this paper we investigate how acoustic clustering, together with lexicon constraints, can be used to build a self-organised inventory. Six English speech synthesis systems were built using two frameworks, unit selection and parametric HTS for three inventory conditions: 1) a traditional phone set, 2) a system using orthographic units, and 3) a self-organised inventory. A listening test showed a strong preference for the classic system, and for the orthographic system over the self-organised system. Results also varied by letter to sound complexity and database coverage. This suggests the self-organised approach failed to generalise pronunciation as well as introducing noise above and beyond that caused by orthographic sound mismatch.

Full Paper

Bibliographic reference.  Aylett, Matthew P. / King, Simon / Yamagishi, Junichi (2009): "Speech synthesis without a phone inventory", In INTERSPEECH-2009, 2087-2090.