doi: 10.21437/SSW.2021
Identifying the vocal cues of likeability, friendliness and skilfulness in synthetic speech
Sai Sirisha Rallabandi, Babak Naderi, Sebastian Möller
Extending Text-to-Speech Synthesis with Articulatory Movement Prediction using Ultrasound Tongue Imaging
Tamás Gábor Csapó
Impact of Segmentation and Annotation in French end-to-end Synthesis
Martin Lenglet, Olivier Perrotin, Gérard Bailly
Pathological voice adaptation with autoencoder-based voice conversion
Marc Illa, Bence Mark Halpern, Rob van Son, Laureano Moro-Velazquez, Odette Scharenborg
Location, Location: Enhancing the Evaluation of Text-to-Speech synthesis using the Rapid Prosody Transcription Paradigm
Elijah Gutierrez, Pilar Oplustil-Gallegos, Catherine Lai
Speech Synthesis from Text and Ultrasound Tongue Image-based Articulatory Input
Tamás Gábor Csapó, László Tóth, Gábor Gosztolya, Alexandra Markó
Combining speakers of multiple languages to improve quality of neural voices
Javier Latorre, Charlotte Bailleul, Tuuli Morrill, Alistair Conkie, Yannis Stylianou
Methods of slowing down speech
Christina Tånnander, Jens Edlund
Personality in the mix - investigating the contribution of fillers and speaking style to the perception of spontaneous speech synthesis
Joakim Gustafson, Jonas Beskow, Eva Szekely
Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging
Csaba Zainkó, László Tóth, Amin Honarmandi Shandiz, Gábor Gosztolya, Alexandra Markó, Géza Németh, Tamás Gábor Csapó
Improving Emotional TTS with an Emotion Intensity Input from Unsupervised Extraction
Bastian Schnell, Philip N. Garner
Acquiring conversational speaking style from multi-speaker spontaneous dialog corpus for prosody-controllable sequence-to-sequence speech synthesis
Slava Shechtman, Avrech Ben-David
EmoCat: Language-agnostic Emotional Voice Conversion
Bastian Schnell, Goeric Huybrechts, Bartek Perz, Thomas Drugman, Jaime Lorenzo-Trueba
Enhancing audio quality for expressive Neural Text-to-Speech
Abdelhamid Ezzerg, Adam Gabrys, Bartosz Putrycz, Daniel Korzekwa, Daniel Saez-Trigueros, David McHardy, Kamil Pokora, Jakub Lachowicz, Jaime Lorenzo-Trueba, Viacheslav Klimkov
Are we truly modeling expressiveness? A study on expressive TTS in Brazilian Portuguese for real-life application styles
Lucas H. Ueda, Paula D. P. Costa, Flavio O. Simoes, Mário U. Neto
Vocal tract area function extraction using ultrasound for articulatory speech synthesis
Debasish Ray Mohapatra, Pramit Saha, Yadong Liu, Bryan Gick, Sidney Fels
Non-Autoregressive TTS with Explicit Duration Modelling for Low-Resource Highly Expressive Speech
Raahil Shah, Kamil Pokora, Abdelhamid Ezzerg, Viacheslav Klimkov, Goeric Huybrechts, Bartosz Putrycz, Daniel Korzekwa, Thomas Merritt
Intelligibility and naturalness of articulatory synthesis with VocalTractLab compared to established speech synthesis technologies
Paul Konstantin Krug, Simon Stone, Peter Birkholz
Perception of smiling voice in spontaneous speech synthesis
Ambika Kirkland, Marcin Włodarczak, Joakim Gustafson, Eva Szekely
Voicy: Zero-Shot Non-Parallel Voice Conversion in Noisy Reverberant Environments
Alejandro Mottini, Jaime Lorenzo-Trueba, Sri Vishnu Kumar Karlapati, Thomas Drugman
Rapping-Singing Voice Synthesis based on Phoneme-level Prosody Control
Konstantinos Markopoulos, Nikolaos Ellinas, Alexandra Vioni, Myrsini Christidou, Panos Kakoulidis, Georgios Vamvoukakis, June Sig Sung, Hyoungmin Park, Pirros Tsiakoulis, Aimilios Chalamandaris, Georgia Maniati
Exploring Disentanglement with Multilingual and Monolingual VQ-VAE
Jennifer Williams, Jason Fong, Erica Cooper, Junichi Yamagishi
Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis
Erica Cooper, Xin Wang, Junichi Yamagishi
Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance
Hieu-Thi Luong, Junichi Yamagishi
Low-latency real-time non-parallel voice conversion based on cyclic variational autoencoder and multiband WaveRNN with data-driven linear prediction
Patrick Lumban Tobing, Tomoki Toda
Factors Affecting the Evaluation of Synthetic Speech in Context
Johannah O'Mahony, Pilar Oplustil-Gallegos, Catherine Lai, Simon King
Non-native English lexicon creation for bilingual speech synthesis
Arun Baby, Pranav Jawale, Saranya Vinnaitherthan, Sumukh Badam, Nagaraj Adiga, Sharath Adavane
Cross-lingual Transfer of Phonological Features for Low-resource Speech Synthesis
Dan Wells, Korin Richmond
Mind your p’s and k’s -- Comparing obstruents across TTS voices of the Blizzard Challenge 2013
Ayushi Pandey, Sébastien Le Maguer, Julie Berndsen, Naomi Harte
Improving Polyglot Speech Synthesis through Multi-task and Adversarial Learning
Jason Fong, Jilong Wu, Prabhav Agrawal, Andrew Gibiansky, Thilo Koehler, Qing He
Multi-Scale Spectrogram Modelling for Neural Text-to-Speech
Ammar Abbas, Bajibabu Bollepalli, Alexis Moinet, Arnaud Joly, Penny Karanasou, Peter Makarov, Simon Slangens, Sri Karlapati, Thomas Drugman
How do Voices from Past Speech Synthesis Challenges Compare Today?
Erica Cooper, Junichi Yamagishi
Accent Modeling of Low-Resourced Dialect in Pitch Accent Language Using Variational Autoencoder
Kazuya Yufune, Tomoki Koriyama, Shinnosuke Takamichi, Hiroshi Saruwatari
Liaison and Pronunciation Learning in End-to-End Text-to-Speech in French
Jason Taylor, Sébastien Le Maguer, Korin Richmond
FeatherTTS: Robust and Efficient attention based Neural TTS
Qiao Tian, Chao Liu, Zewang Zhang, Heng Lu, Linghui Chen, Bin Wei, Pujiang He, Shan Liu
Comparing acoustic and textual representations of previous linguistic context for improving Text-to-Speech
Pilar Oplustil-Gallegos, Johannah O'Mahony, Simon King
Audiobook Speech Synthesis Conditioned by Cross-Sentence Context-Aware Word Embeddings
Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Naoko Tanji, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari
Lipsyncing efforts for transcreating lecture videos in Indian languages
Mano Ranjith Kumar M, Jom Kuriakose, Karthik Pandia D S, Hema A Murthy
Homograph disambiguation with contextual word embeddings for TTS systems
Marco Nicolis, Viacheslav Klimkov
Analysing Temporal Sensitivity of VQ-VAE Sub-Phone Codebooks
Jason Fong, Jennifer Williams, Simon King