ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

A stochastic approach to phoneme and accent estimation

Tohru Nagano, Shinsuke Mori, Masafumi Nishimura

We present a new stochastic approach to estimate accurately phonemes and accents for Japanese TTS (Text-to-Speech) systems. Front-end process of TTS system assigns phonemes and accents to an input plain text, which is critical for creating intelligible and natural speech. Rule-based approaches that build hierarchical structures are widely used for this purpose. However, considering scalability and the ease of domain adaptation, rule-based approaches have well-known limitations. In this paper, we present a stochastic method based on an n-gram model for phonemes and accents estimation. The proposed method estimates not only phonemes and accents but word segmentation and part-of-speech (POS) simultaneously. We implemented a system for Japanese which solves tokenization, linguistic annotation, text-to-phonemes conversion, homograph disambiguation, and accents generation at the same time, and observed promising results.


doi: 10.21437/Interspeech.2005-575

Cite as: Nagano, T., Mori, S., Nishimura, M. (2005) A stochastic approach to phoneme and accent estimation. Proc. Interspeech 2005, 3293-3296, doi: 10.21437/Interspeech.2005-575

@inproceedings{nagano05_interspeech,
  author={Tohru Nagano and Shinsuke Mori and Masafumi Nishimura},
  title={{A stochastic approach to phoneme and accent estimation}},
  year=2005,
  booktitle={Proc. Interspeech 2005},
  pages={3293--3296},
  doi={10.21437/Interspeech.2005-575}
}