ISCA Archive SLTU 2014
ISCA Archive SLTU 2014

High quality speech synthesis using a small speech dataset

Pavel Chistikov, Andrey Talanov

We propose an approach to synthesizing high-quality speech under the conditions of a small dataset. A robust method for solving this problem is vital for voice restoration (recreation of lost fragments of records based on available speech material of a well-known person, e.g. an actor). The proposed TTS system is a hybrid system which includes the advantages of both HMM- and Unit Selection-based TTS systems. The approach described in the paper is based on statistical models of intonation parameters and special algorithms of speech element concatenation and modification. Listening tests show that it is possible to synthesize high-quality speech even with a small speech database (approximately one hour of speech).

Index Terms: speech synthesis, voice restoration, hidden Markov models, Unit Selection, speech modification.


Cite as: Chistikov, P., Talanov, A. (2014) High quality speech synthesis using a small speech dataset. Proc. 4th Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU 2014), 105-111

@inproceedings{chistikov14_sltu,
  author={Pavel Chistikov and Andrey Talanov},
  title={{High quality speech synthesis using a small speech dataset}},
  year=2014,
  booktitle={Proc. 4th Workshop on Spoken Language Technologies for Under-Resourced Languages  (SLTU 2014)},
  pages={105--111}
}