In Text to Speech synthesis system F0 contour plays an important role in conveying prosodic information but the process of synthesizing F0 contour from the underlying linguistic information using deep architecture has not been investigated in case of Bengali languages. This paper describes a method for synthesizing F0 contours of Bengali readout speech from the textual features of input text using Deep Boltzmann Machine (DBM) and Twin Gaussian Process (TGP) hybrid model. DBM will capture the high-level linguistic structure of input text and improve the prediction accuracy when plugged into the TGP model. Unlike Gaussian Process (GP) models which only focus on the prediction of a single output (F0), TGP can generalize across multiple outputs (F0, delta F0, delta-delta F0) by encoding relations between both inputs and outputs with GP priors. The performance of the proposed method is evaluated and compared with other available methods using objective and perceptual listening tests and the results are found to be satisfactory.
Bibliographic reference. Mukherjee, Sankar / Mandal, Shyamal Kumar Das (2014): "Generation of F0 contour using deep boltzmann machine and twin Gaussian process hybrid model for bengali language", In INTERSPEECH-2014, 2445-2449.