12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Letter-to-Phoneme Conversion Based on Two-Stage Neural Network Focusing on Letter and Phoneme Contexts

Kheang Seng, Yurie Iribe, Tsuneo Nitta

Toyohashi University of Technology, Japan

The improvement of Letter-To-Phoneme (L2P) conversion that can output the phoneme strings corresponding to Out-Of-Vocabulary (OOV) words, especially in English language, has become one of the most important issues in Text-To-Speech (TTS) research. In this paper, we propose a Two-Stage Neural Network (NN) based approach to solve the problem of conflicting output at a phonemic level. Both Letter and Phoneme Context-Dependent models are combined and implemented in the first-stage NN to convert several letters into several phonemes. Then, the second-stage NN can predict the final output phoneme by observing on a combination of several consecutive phoneme sequences that obtained from the first-stage NN. Therefore, our L2P conversion module takes a sequence of letters as input and outputs only one phoneme at each time. By focusing mainly on the result of word accuracy of OOV words, this new approach usually provides a higher performance.

Full Paper

Bibliographic reference.  Seng, Kheang / Iribe, Yurie / Nitta, Tsuneo (2011): "Letter-to-phoneme conversion based on two-stage neural network focusing on letter and phoneme contexts", In INTERSPEECH-2011, 1885-1888.