Phrase break prediction is the first step in modeling prosody for text-to-speech systems (TTS). Traditional methods of phrase break prediction have used discrete linguistic representations (like POS tags, induced POS tags, word-terminal syllables) for modeling these breaks. However these discrete representations suffer from a number of issues such as fixing the number of discrete classes and also such a representation does not capture the co-occurrence statistics of the words. As a result, the use of continuous valued word representation was proposed in literature. In this paper, we propose a neural network dictionary learning architecture to induce task specific continuous valued word representations, and show that these task specific features perform better at phrase break prediction as compared to continuous features derived using Latent Semantic Analysis (LSA).
Bibliographic reference. Vadapalli, Anandaswarup / Prahallad, Kishore (2014): "Learning continuous-valued word representations for phrase break prediction", In INTERSPEECH-2014, 41-45.