A Robust Voiced/Unvoiced Phoneme Classification from Whispered Speech Using the ‘Color’ of Whispered Phonemes and Deep Neural Network

G. Nisha Meenakshi, Prasanta Kumar Ghosh


In this work, we propose a robust method to perform frame-level classification of voiced (V) and unvoiced (UV) phonemes from whispered speech, a challenging task due to its voiceless and noise-like nature. We hypothesize that a whispered speech spectrum can be represented as a linear combination of a set of colored noise spectra. A five-dimensional (5D) feature is computed by employing non-negative matrix factorization with a fixed basis dictionary, constructed using spectra of five colored noises. Deep Neural Network (DNN) is used as the classifier. We consider two baseline features-1) Mel Frequency Cepstral Coefficients (MFCC), 2) features computed from a data driven dictionary. Experiments reveal that the features from the colored noise dictionary perform better (on average) than that using the data driven dictionary, with a relative improvement in the average V/UV accuracy of 10.30%, within, and 10.41%, across, data from seven subjects. We also find that the MFCCs and 5D features carry complementary information regarding the nature of voicing decisions in whispered speech. Hence, across all subjects, we obtain a balanced frame-level V/UV classification performance, when MFCC and 5D features are combined, compared to a skewed performance when they are considered separately.


 DOI: 10.21437/Interspeech.2017-1388

Cite as: Meenakshi, G.N., Ghosh, P.K. (2017) A Robust Voiced/Unvoiced Phoneme Classification from Whispered Speech Using the ‘Color’ of Whispered Phonemes and Deep Neural Network. Proc. Interspeech 2017, 503-507, DOI: 10.21437/Interspeech.2017-1388.


@inproceedings{Meenakshi2017,
  author={G. Nisha Meenakshi and Prasanta Kumar Ghosh},
  title={A Robust Voiced/Unvoiced Phoneme Classification from Whispered Speech Using the ‘Color’ of Whispered Phonemes and Deep Neural Network},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={503--507},
  doi={10.21437/Interspeech.2017-1388},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1388}
}