doi: 10.21437/SSW.2019
Deep learning for speech synthesis
Aäron van den Oord
Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis
Xin Wang, Junichi Yamagishi
A Comparison of Recent Neural Vocoders for Speech Signal Reconstruction
Prachi Govalkar, Johannes Fischer, Frank Zalkow, Christian Dittmar
Deep neural network based real-time speech vocoder with periodic and aperiodic inputs
Keiichiro Oura, Kazuhiro Nakamura, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda
Generative Adversarial Network based Speaker Adaptation for High Fidelity WaveNet Vocoder
Qiao Tian, Xucheng Wan, Shan Liu
Neural Text-to-Speech Adaptation from Low Quality Public Recordings
Qiong Hu, Erik Marchi, David Winarsky, Yannis Stylianou, Devang Naik, Sachin Kajarekar
Neural VTLN for Speaker Adaptation in TTS
Bastian Schnell, Philip N. Garner
Problem-Agnostic Speech Embeddings for Multi-Speaker Text-to-Speech with SampleRNN
David Álvarez, Santiago Pascual, Antonio Bonafonte
Multi-Speaker Modeling for DNN-based Speech Synthesis Incorporating Generative Adversarial Networks
Hiroki Kanagawa, Yusuke Ijima
Speaker Adaptation of Acoustic Model using a Few Utterances in DNN-based Speech Synthesis Systems
Ivan Himawan, Sandesh Aryal, Iris Ouyang, Shukhan Ng, Pierre Lanchantin
DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis
Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari
Generalization of Spectrum Differential based Direct Waveform Modification for Voice Conversion
Wen-Chin Huang, Yi-Chiao Wu, Kazuhiro Kobayashi, Yu-Huai Peng, Hsin-Te Hwang, Patrick Lumban Tobing, Yu Tsao, Hsin-Min Wang, Tomoki Toda
Statistical Voice Conversion with Quasi-periodic WaveNet Vocoder
Yi-Chiao Wu, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda
Voice Conversion without Explicit Separation of Source and Filter Components Based on Non-negative Matrix Factorization
Hitoshi Suda, Daisuke Saito, Nobuaki Minematsu
Voice conversion based on full-covariance mixture density networks for time-variant linear transformations
Gaku Kotani, Daisuke Saito
Unsupervised Learning of a Disentangled Speech Representation for Voice Conversion
Tobias Gburrek, Thomas Glarner, Janek Ebbers, Reinhold Haeb-Umbach, Petra Wagner
Novel Inception-GAN for Whispered-to-Normal Speech Conversion
Maitreya Patel, Mihir Parmar, Savan Doshi, Nirmesh Shah, Hemant Patil
Implementation of DNN-based real-time voice conversion and its improvements by audio data augmentation and mask-shaped device
Riku Arakawa, Shinnosuke Takamichi, Hiroshi Saruwatari
Synthesizing animal vocalizations and modelling animal speech
Tecumseh Fitch, Bart de Boer
Evaluating Long-form Text-to-Speech: Comparing the Ratings of Sentences and Paragraphs
Rob Clark, Hanna Silen, Tom Kenter, Ralph Leith
Speech Synthesis Evaluation — State-of-the-Art Assessment and Suggestion for a Novel Research Program
Petra Wagner, Jonas Beskow, Simon Betz, Jens Edlund, Joakim Gustafson, Gustav Eje Henter, Sébastien Le Maguer, Zofia Malisz, Éva Székely, Christina Tånnander, Jana Voße
Rakugo speech synthesis using segment-to-segment neural transduction and style tokens — toward speech synthesis for entertaining audiences
Shuhei Kato, Yusuke Yasuda, Xin Wang, Erica Cooper, Shinji Takaki, Junichi Yamagishi
Voice Puppetry: Exploring Dramatic Performance to Develop Speech Synthesis
Matthew Aylett, David Braude, Christopher Pidcock, Blaise Potard
Measuring the contribution to cognitive load of each predicted vocoder speech parameter in DNN-based speech synthesis
Avashna Govender, Cassia Valentini-Botinhao, Simon King
Statistical parametric synthesis of budgerigar songs
Lorenz Gutscher, Michael Pucher, Carina Lozo, Marisa Hoeschele, Daniel C. Mann
GlottDNN-based spectral tilt analysis of tense voice emotional styles for the expressive 3D numerical synthesis of vowel [a]
Marc Freixes, Marc Arnela, Francesc Alías, Joan Claudi Socoró
Preliminary guidelines for the efficient management of OOV words for spoken text
Christina Tånnander, Jens Edlund
Loss Function Considering Temporal Sequence for Feed-Forward Neural Network–Fundamental Frequency Case
Noriyuki Matsunaga, Yamato Ohtani, Tatsuya Hirahara
Sparse Approximation of Gram Matrices for GMMN-based Speech Synthesis
Tomoki Koriyama, Shinnosuke Takamichi, Takao Kobayashi
Speaker Anonymization Using X-vector and Neural Waveform Models
Fuming Fang, Xin Wang, Junichi Yamagishi, Isao Echizen, Massimiliano Todisco, Nicholas Evans, Jean-Francois Bonastre
V2S attack: building DNN-based voice conversion from automatic speaker verification
Taiki Nakamura, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Hiroshi Saruwatari
Impacts of input linguistic feature representation on Japanese end-to-end speech synthesis
Takato Fujimoto, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda
Evaluation of Block-Wise Parameter Generation for Statistical Parametric Speech Synthesis
Nobuyuki Nishizawa, Tomohiro Obara, Gen Hattori
Low computational cost speech synthesis based on deep neural networks using hidden semi-Markov model structures
Motoki Shimada, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda
Neural iTTS: Toward Synthesizing Speech in Real-time with End-to-end Neural Text-to-Speech Framework
Tomoya Yanagita, Sakriani Sakti, Satoshi Nakamura
Natural Language Generation: Creating Text
Claire Gardent
Enhancing Myanmar Speech Synthesis with Linguistic Information and LSTM-RNN
Aye Mya Hlaing, Win Pa Pa, Ye Kyaw Thu
Building Multilingual End-to-End Speech Synthesisers for Indian Languages
Anusha Prakash, Anju Leela Thomas, S. Umesh, Hema A Murthy
Diphthong interpolation, phone mapping, and prosody transfer for speech synthesis of similar dialect pairs
Michael Pucher, Carina Lozo, Philip Vergeiner, Dominik Wallner
Subset Selection, Adaptation, Gemination and Prosody Prediction for Amharic Text-to-Speech Synthesis
Elshadai Tesfaye Biru, Yishak Tofik Mohammed, David Tofu, Erica Cooper, Julia Hirschberg
Initial investigation of encoder-decoder end-to-end TTS using marginalization of monotonic hard alignments
Yusuke Yasuda, Xin Wang, Junichi Yamagishi
Where do the improvements come from in sequence-to-sequence neural TTS?
Oliver Watts, Gustav Eje Henter, Jason Fong, Cassia Valentini-Botinhao
A Comparison of Letters and Phones as Input to Sequence-to-Sequence Models for Speech Synthesis
Jason Fong, Jason Taylor, Korin Richmond, Simon King
Generative Modeling of F0 Contours Leveraged by Phrase Structure and Its Application to Statistical Focus Control
Yuma Shirahata, Daisuke Saito, Nobuaki Minematsu
Subword tokenization based on DNN-based acoustic model for end-to-end prosody generation
Masashi Aso, Shinnosuke Takamichi, Norihiro Takamune, Hiroshi Saruwatari
Using generative modelling to produce varied intonation for speech synthesis
Zack Hodari, Oliver Watts, Simon King
How to train your fillers: uh and um in spontaneous speech synthesis
Éva Székely, Gustav Eje Henter, Jonas Beskow, Joakim Gustafson
An Investigation of Features for Fundamental Frequency Pattern Prediction in Electrolaryngeal Speech Enhancement
Mohammad Eshghi, Kou Tanaka, Kazuhiro Kobayashi, Hirokazu Kameoka, Tomoki Toda
PROMIS: a statistical-parametric speech synthesis system with prominence control via a prominence network
Zofia Malisz, Harald Berthelsen, Jonas Beskow, Joakim Gustafson
Deep Mixture-of-Experts Models for Synthetic Prosodic-Contour Generation
Raul Fernandez
Prosody Prediction from Syntactic, Lexical, and Word Embedding Features
Rose Sloan, Syed Sarfaraz Akhtar, Bryan Li, Ritvik Shrivastava, Agustin Gravano, Julia Hirschberg
Sequence to Sequence Neural Speech Synthesis with Prosody Modification Capabilities
Slava Shechtman, Alex Sorin