9th ISCA Speech Synthesis Workshop

13-15 Sep 2016, Sunnyvale, USA

Alan W. Black

DOI: 10.21437/SSW.2016



Poster Session 1


Non-filter waveform generation from cepstrum using spectral phase reconstruction
Yasuhiro Hamada, Nobutaka Ono, Shigeki Sagayama

Investigating Spectral Amplitude Modulation Phase Hierarchy Features in Speech Synthesis
Alexandros Lazaridis, Milos Cernak, Pierre-Edouard Honnet, Philip N. Garner

Multidimensional scaling of systems in the Voice Conversion Challenge 2016
Mirjam Wester, Zhizheng Wu, Junichi Yamagishi

An Automatic Voice Conversion Evaluation Strategy Based on Perceptual Background Noise Distortion and Speaker Similarity
Dong-Yan Huang, Lei Xie, Yvonne Siu Wa Lee, Jie Wu, Huaiping Ming, Xiaohai Tian, Shaofei Zhang, Chuang Ding, Mei Li, Quy Hy Nguyen, Minghui Dong, Haizhou LI

Nonaudible murmur enhancement based on statistical voice conversion and noise suppression with external noise monitoring
Yusuke Tajiri, Tomoki Toda

Prosodic and Spectral iVectors for Expressive Speech Synthesis
Igor Jauk, Antonio Bonafonte

Development of a statistical parametric synthesis system for operatic singing in German
Michael Pucher, Fernando Villavicencio, Junichi Yamagishi

DNN-based Speech Synthesis for Indian Languages from ASCII text
Srikanth Ronanki, Siva Reddy, Bajibabu Bollepalli, Simon King

Experiments with Cross-lingual Systems for Synthesis of Code-Mixed Text
Sunayana Sitaram, Sai Krishna Rallabandi, Shruti Rijhwani, Alan W. Black

Jerk Minimization for Acoustic-To-Articulatory Inversion
Avni Rajpal, Hemant A. Patil

How to select a good voice for TTS
Sunhee Kim

WikiSpeech – enabling open source text-to-speech for Wikipedia
John Andersson, Sebastian Berlin, André Costa, Harald Berthelsen, Hanna Lindgren, Nikolaj Lindberg, Jonas Beskow, Jens Edlund, Joakim Gustafson


Keynote Session 1


Siri’s voice gets deep learning
Alex Acero



Demo Session


Prosodic Reading Tutor of Japanese, Suzuki-kun: The first and only educational tool to teach the formal Japanese
Nobuaki Minematsu, Daisuke Saito, Nobuyuki Nishizawa

Aliasing-free L-F model and its application to an interactive MATLAB tool and test signal generation for speech analysis procedures
Hideki Kawahara

A Demonstration of the Merlin Open Source Neural Network Speech Synthesis System
Srikanth Ronanki, Zhizheng Wu, Oliver Watts, Simon King

WaveNet: A Generative Model for Raw Audio
Aäron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, Koray Kavukcuoglu

Demo of Idlak Tangle, An Open Source DNN-Based Parametric Speech Synthesiser
Blaise Potard, Matthew P. Aylett, David A. Baude


Poster Session 2


Non-intrusive Quality Assessment of Synthesized Speech using Spectral Features and Support Vector Regression
Meet H. Soni, Hemant A. Patil

Novel Pre-processing using Outlier Removal in Voice Conversion
Sushant V. Rao, Nirmesh J Shah, Hemant A. Patil

Emotional Voice Conversion Using Neural Networks with Different Temporal Scales of F0 based on Wavelet Transform
Zhaojie Luo, Tetsuya Takiguchi, Yasuo Ariki

Investigating RNN-based speech enhancement methods for noise-robust Text-to-Speech
Cassia Valentini-Botinhao, Xin Wang, Shinji Takaki, Junichi Yamagishi

Speaker Adaptation of Various Components in Deep Neural Network based Speech Synthesis
Shinji Takaki, SangJin Kim, Junichi Yamagishi

Mandarin Prosodic Phrase Prediction based on Syntactic Trees
Zhengchen Zhang, Fuxiang Wu, Chenyu Yang, Minghui Dong, Fugen Zhou

Investigating Very Deep Highway Networks for Parametric Speech Synthesis
Xin Wang, Shinji Takaki, Junichi Yamagishi

Contextual Representation using Recurrent Neural Network Hidden State for Statistical Parametric Speech Synthesis
Sivanand Achanta, Rambabu Banoth, Ayushi Pandey, Anandaswarup Vadapalli, Suryakanth V Gangashetty

Wide Passband Design for Cosine-Modulated Filter Banks in Sinusoidal Speech Synthesis
Nobuyuki Nishizawa, Tomonori Yazaki

Utterance Selection Techniques for TTS Systems Using Found Speech
Pallavi Baljekar, Alan W. Black

Open-Source Consumer-Grade Indic Text To Speech
Andrew Wilkinson, Alok Parlikar, Sunayana Sitaram, Tim White, Alan W. Black, Suresh Bazaj

On the impact of phoneme alignment in DNN-based speech synthesis
Mei Li, Zhizheng Wu, Lei Xie

Merlin: An Open Source Neural Network Speech Synthesis System
Zhizheng Wu, Oliver Watts, Simon King


Keynote Session 3


End-to-end Learning for Text and Speech
Quoc V. Le


Oral Session 3: Analysis and Modeling for Speech Synthesis


A hybrid harmonics-and-bursts modelling approach to speech synthesis
Jonas Beskow, Harald Berthelsen

A Pulse Model in Log-domain for a Uniform Synthesizer
Gilles Degottex, Pierre Lanchantin, Mark Gales

Using instantaneous frequency and aperiodicity detection to estimate F0 for high-quality speech synthesis
Hideki Kawahara, Yannis Agiomyrgiannakis, Heiga Zen

Wideband Harmonic Model: Alignment and Noise Modeling for High Quality Speech Synthesis
Slava Shechtman, Alex Sorin