Novel Inception-GAN for Whispered-to-Normal Speech Conversion

Maitreya Patel, Mihir Parmar, Savan Doshi, Nirmesh Shah, Hemant Patil


Recently, Convolutional Neural Networks (CNN)-based Generative Adversarial Networks (GANs) are used for Whisper-toNormal Speech (i.e., WHSP2SPCH) conversion task. These CNN-based GANs are significantly difficult to train in terms of computational complexity. Goal of the generator in GAN is to map the features of the whispered speech to that of the normal speech efficiently. To improve the performance, we need to either tune the cost functions by changing hyperparameters associated with it or to make the generator more complex by adding more layers to the model. However, more complex architectures are prone to overfitting. Both solutions are time-consuming and computationally expensive. Hence, in this paper, we propose Inception-based GAN architecture (i.e., Inception-GAN). Our proposed architecture is quite stable and computationally less expensive while training. The proposed Inception-GAN outperforms existing CNN-based GAN architectures (CNN-GAN). Objective and subjective results are carried out using the proposed architectures on statistically meaningful whispered TIMIT (wTIMIT) corpus. For a speakerspecific evaluations, Inception-GAN shows 8.9\% and 6.2\% better perfomance objectively compared to the CNN-based GAN for male and female speaker, respectively.


 DOI: 10.21437/SSW.2019-16

Cite as: Patel, M., Parmar, M., Doshi, S., Shah, N., Patil, H. (2019) Novel Inception-GAN for Whispered-to-Normal Speech Conversion. Proc. 10th ISCA Speech Synthesis Workshop, 87-92, DOI: 10.21437/SSW.2019-16.


@inproceedings{Patel2019,
  author={Maitreya Patel and Mihir Parmar and Savan Doshi and Nirmesh Shah and Hemant Patil},
  title={{Novel Inception-GAN for Whispered-to-Normal Speech Conversion}},
  year=2019,
  booktitle={Proc. 10th ISCA Speech Synthesis Workshop},
  pages={87--92},
  doi={10.21437/SSW.2019-16},
  url={http://dx.doi.org/10.21437/SSW.2019-16}
}