15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Generation of F0 Contour Using Deep Boltzmann Machine and Twin Gaussian Process Hybrid Model for Bengali Language

Sankar Mukherjee, Shyamal Kumar Das Mandal

IIT Kharagpur, India

In Text to Speech synthesis system F0 contour plays an important role in conveying prosodic information but the process of synthesizing F0 contour from the underlying linguistic information using deep architecture has not been investigated in case of Bengali languages. This paper describes a method for synthesizing F0 contours of Bengali readout speech from the textual features of input text using Deep Boltzmann Machine (DBM) and Twin Gaussian Process (TGP) hybrid model. DBM will capture the high-level linguistic structure of input text and improve the prediction accuracy when plugged into the TGP model. Unlike Gaussian Process (GP) models which only focus on the prediction of a single output (F0), TGP can generalize across multiple outputs (F0, delta F0, delta-delta F0) by encoding relations between both inputs and outputs with GP priors. The performance of the proposed method is evaluated and compared with other available methods using objective and perceptual listening tests and the results are found to be satisfactory.

Full Paper

Bibliographic reference.  Mukherjee, Sankar / Mandal, Shyamal Kumar Das (2014): "Generation of F0 contour using deep boltzmann machine and twin Gaussian process hybrid model for bengali language", In INTERSPEECH-2014, 2445-2449.