doi: 10.21437/SSW.2023
Advocating for text input in multi-speaker text-to-speech systems
Gérard Bailly, Martin Lenglet, Olivier Perrotin, Esther Klabbers
Spell4TTS: Acoustically-informed spellings for improving text-to-speech pronunciations
Jason Fong, Hao Tang, Simon King
A Comparative Analysis of Pretrained Language Models for Text-to-Speech
Marcel Granero Moya, Penny Karanasou, Sri Karlapati, Bastian Schnell, Nicole Peinelt, Alexis Moinet, Thomas Drugman
Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language Selection
Phat Do, Matt Coler, Jelske Dijkstra, Esther Klabbers
Importance of Human Factors in Text-To-Speech Evaluations
Lev Finkelstein, Joshua Camp, Rob Clark
Re-examining the quality dimensions of synthetic speech
Fritz Seebauer, Michael Kuhlmann, Reinhold Haeb-Umbach, Petra Wagner
Stuck in the MOS pit: A critical analysis of MOS test methodology in TTS evaluation
Ambika Kirkland, Shivam Mehta, Harm Lameris, Gustav Eje Henter, Eva Szekely, Joakim Gustafson
MooseNet: A Trainable Metric for Synthesized Speech with a PLDA Module
Ondřej Plátek, Ondrej Dusek
Cross-lingual transfer using phonological features for resource-scarce text-to-speech
Johannes Abraham Louw
Improving robustness of spontaneous speech synthesis with linguistic speech regularization and pseudo-filled-pause insertion
Yuta Matsunaga, Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari
Situating Speech Synthesis: Investigating Contextual Factors in the Evaluation of Conversational TTS
Harm Lameris, Ambika Kirkland, Joakim Gustafson, Eva Szekely
Synthesising turn-taking cues using natural conversational data
Johannah O'Mahony, Catherine Lai, Simon King
StarGAN-VC++: Towards Emotion Preserving Voice Conversion Using Deep Embeddings
Arnab Das, Suhita Ghosh, Tim Polzehl, Ingo Siegert, Sebastian Stober
PRVAE-VC: Non-Parallel Many-to-Many Voice Conversion with Perturbation-Resistant Variational Autoencoder
Kou Tanaka, Hirokazu Kameoka, Takuhiro Kaneko
Federated Learning for Human-in-the-Loop Many-to-Many Voice Conversion
Ryunosuke Hirai, Yuki Saito, Hiroshi Saruwatari
HiFi-VC: High Quality ASR-based Voice Conversion
Anton Kashkin, Ivan Karpukhin, Svyatoslav Shishkin
EmoSpeech: guiding FastSpeech2 towards Emotional Text to Speech
Daria Diatlova, Vitalii Shutov
Controllable Emphasis with zero data for text-to-speech
Arnaud Joly, Marco Nicolis, Ekaterina Peterova, Alessandro Lombardi, Ammar Abbas, Arent van Korlaar, Aman Hussain, Parul Sharma, Alexis Moinet, Mateusz Lajszczak, Penny Karanasou, Antonio Bonafonte, Thomas Drugman, Elena Sokolova
Local Style Tokens: Fine-Grained Prosodic Representations For TTS Expressive Control
Martin Lenglet, Olivier Perrotin, Gérard Bailly
Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody
Sofoklis Kakouros, Juraj Šimko, Martti Vainio, Antti Suni
An analysis on the effects of speaker embedding choice in non auto-regressive TTS
Adriana Stan, Johannah O'Mahony
Audiobook synthesis with long-form neural text-to-speech
Weicheng Zhang, Cheng-Chieh Yeh, Will Beckman, Tuomo Raitio, Ramya Rasipuram, Ladan Golipour, David Winarsky
Improving the quality of neural TTS using long-form content and multi-speaker multi-style modeling
Tuomo Raitio, Javier Latorre, Andrea Davis, Tuuli Morrill, Ladan Golipour
Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis
Shivam Mehta, Siyang Wang, Simon Alexanderson, Jonas Beskow, Eva Szekely, Gustav Eje Henter
Diffusion Transformer for Adaptive Text-to-Speech
Haolin Chen, Philip N. Garner
On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis
Siyang Wang, Gustav Eje Henter, Joakim Gustafson, Eva Szekely
Voice Cloning: Training Speaker Selection with Limited Multi-Speaker Corpus
David Guennec, Lily Wadoux, Aghilas Sini, Nelly Barbot, Damien Lolive
Adaptive Duration Modification of Speech using Masked Convolutional Networks and Open-Loop Time Warping
Ravi Shankar, Archana Venkataraman
Learning Multilingual Expressive Speech Representation for Prosody Prediction without Parallel Data
Jarod Duret, Yannick Estève, Titouan Parcollet
Subjective Evaluation of Text-to-Speech Models: Comparing Absolute Category Rating and Ranking by Elimination Tests
Kishor Kayyar, Christian Dittmar, Nicola Pia, Emanuel Habets
Better Replacement for TTS Naturalness Evaluation
Sajad Shirali-Shahreza, Gerald Penn
The Impact of Pause-Internal Phonetic Particles on Recall in Synthesized Lectures
Mikey Elmers, Eva Szekely
SPTK4: An Open-Source Software Toolkit for Speech Signal Processing
Takenori Yoshimura, Takato Fujimoto, Keiichiro Oura, Keiichi Tokuda
FiPPiE: A Computationally Efficient Differentiable method for Estimating Fundamental Frequency From Spectrograms
Lev Finkelstein, Chun-an Chan, Vincent Wan, Heiga Zen, Rob Clark
Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications
Biel Tura Vecino, Adam Gabrys, Daniel Matwicki, Andrzej Pomirski, Tom Iddon, Marius Cotescu, Jaime Lorenzo-Trueba
Data Augmentation Methods on Ultrasound Tongue Images for Articulation-to-Speech Synthesis
Ibrahim Ibrahimov, Gabor Gosztolya, Tamas Gabor Csapo
Universal Approach to Multilingual Multispeaker Child Speech SynthesisUniversal Approach to Multilingual Multispeaker Child Speech Synthesis
Shaimaa Alwaisi, Mohammed Salah Al-Radhi, Géza Németh
Towards Speaker-Independent Voice Conversion for Improving Dysarthric Speech Intelligibility
Seraphina Fong, Marco Matassoni, Gianluca Esposito, Alessio Brutti
Exploring the multidimensional representation of individual speech acoustic parameters extracted by deep unsupervised models
Maxime Jacquelin, Maeva Garnier, Laurent Girin, Rémy Vincent, Olivier Perrotin
SarcasticSpeech: Speech Synthesis for Sarcasm in Low-Resource Scenarios
Zhu Li, Xiyuan Gao, Shekhar Nayak, Matt Coler
Recovering Discrete Prosody Inputs via Invert-Classify
Nicholas Sanders, Korin Richmond
Using a Large Language Model to Control Speaking Style for Expressive TTS
Atli Thor Sigurgeirsson, Simon King
NaijaTTS: A pitch-controllable TTS model for Nigerian Pidgin
Emmett Strickland, Dana Aubakirova, Dorin Doncenco, Diego Torres, Marc Evrard
Article |
---|