Unit Selection with Hierarchical Cascaded Long Short Term Memory Bidirectional Recurrent Neural Nets

Vincent Pollet, Enrico Zovato, Sufian Irhimeh, Pier Batzu


Bidirectional recurrent neural nets have demonstrated state-of-the-art performance for parametric speech synthesis. In this paper, we introduce a top-down application of recurrent neural net models to unit-selection synthesis. A hierarchical cascaded network graph predicts context phone duration, speech unit encoding and frame-level logF0 information that serves as targets for the search of units. The new approach is compared with an existing state-of-art hybrid system that uses Hidden Markov Models as basis for the statistical unit search.


 DOI: 10.21437/Interspeech.2017-428

Cite as: Pollet, V., Zovato, E., Irhimeh, S., Batzu, P. (2017) Unit Selection with Hierarchical Cascaded Long Short Term Memory Bidirectional Recurrent Neural Nets. Proc. Interspeech 2017, 3966-3970, DOI: 10.21437/Interspeech.2017-428.


@inproceedings{Pollet2017,
  author={Vincent Pollet and Enrico Zovato and Sufian Irhimeh and Pier Batzu},
  title={Unit Selection with Hierarchical Cascaded Long Short Term Memory Bidirectional Recurrent Neural Nets},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={3966--3970},
  doi={10.21437/Interspeech.2017-428},
  url={http://dx.doi.org/10.21437/Interspeech.2017-428}
}