Bidirectional recurrent neural nets have demonstrated state-of-the-art performance for parametric speech synthesis. In this paper, we introduce a top-down application of recurrent neural net models to unit-selection synthesis. A hierarchical cascaded network graph predicts context phone duration, speech unit encoding and frame-level logF0 information that serves as targets for the search of units. The new approach is compared with an existing state-of-art hybrid system that uses Hidden Markov Models as basis for the statistical unit search.
Cite as: Pollet, V., Zovato, E., Irhimeh, S., Batzu, P. (2017) Unit Selection with Hierarchical Cascaded Long Short Term Memory Bidirectional Recurrent Neural Nets. Proc. Interspeech 2017, 3966-3970, doi: 10.21437/Interspeech.2017-428
@inproceedings{pollet17_interspeech, author={Vincent Pollet and Enrico Zovato and Sufian Irhimeh and Pier Batzu}, title={{Unit Selection with Hierarchical Cascaded Long Short Term Memory Bidirectional Recurrent Neural Nets}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={3966--3970}, doi={10.21437/Interspeech.2017-428} }