ISCA Archive SSW 2013
ISCA Archive SSW 2013

A common attribute based unified HTS framework for speech synthesis in Indian languages

B. Ramani, S. Lilly Christina, G. Anushiya Rachel, V. Sherlin Solomi, Mahesh Kumar Nandwana, Anusha Prakash, S. Aswin Shanmugam, Raghava Krishnan, S. Kishore Prahalad, K. Samudravijaya, P. Vijayalakshmi, T. Nagarajan, Hema A. Murthy

State-of-the art approaches to speech synthesis are unit selection based concatenative speech synthesis (USS) and hidden Markov model based Text to speech synthesis (HTS). The former is based on waveform concatenation of subword units, while the latter is based on generation of an optimal parameter sequence from subword HMMs. The quality of an HMM based synthesiser in the HTS framework, crucially depends on an accurate description of the phoneset, and accurate description of the question set for clustering of the phones. Given the number of Indian languages, building a HTS system for every language is time consuming. Exploiting the properties of Indian languages, a uniform HMM framework for building speech synthesisers is proposed. Apart from the speech and text data used, the tasks involved in building a synthesis system can be made language-independent. A language-independent common phone set is first derived. Similar articulatory descriptions also hold for sounds that are similar. The common phoneset and common question set are used to build HTS based systems for six Indian languages, namely, Hindi, Marathi, Bengali, Tamil, Telugu and Malayalam. Mean opinion score (MOS) is used to evaluate the system. An average MOS of 3.0 for naturalness and 3.4 for intelligibility is obtained for all languages.


Cite as: Ramani, B., Christina, S.L., Rachel, G.A., Solomi, V.S., Nandwana, M.K., Prakash, A., Shanmugam, S.A., Krishnan, R., Prahalad, S.K., Samudravijaya, K., Vijayalakshmi, P., Nagarajan, T., Murthy, H.A. (2013) A common attribute based unified HTS framework for speech synthesis in Indian languages. Proc. 8th ISCA Workshop on Speech Synthesis (SSW 8), 291-296

@inproceedings{ramani13_ssw,
  author={B. Ramani and S. Lilly Christina and G. Anushiya Rachel and V. Sherlin Solomi and Mahesh Kumar Nandwana and Anusha Prakash and S. Aswin Shanmugam and Raghava Krishnan and S. Kishore Prahalad and K. Samudravijaya and P. Vijayalakshmi and T. Nagarajan and Hema A. Murthy},
  title={{A common attribute based unified HTS framework for speech synthesis in Indian languages}},
  year=2013,
  booktitle={Proc. 8th ISCA Workshop on Speech Synthesis (SSW 8)},
  pages={291--296}
}