Whispered speech is a natural mode of speech in which voicing is absent its acoustics differ significantly from normally spoken speech or so-called neutral speech, such that it is challenging to use only neutral speech to build speech processing and automatic recognition systems that can deal effectively with whisper. At the same time, humans can naturally produce and perceive whispered speech without explicit training. Tonal languages such as Mandarin present an interesting dilemma tone is primarily encoded by pitch tracks which are absent during whispered speech, but humans can still tell tones apart. How humans manage to process whispered speech well without explicit training on it, whereas machine algorithms fail, is presently an unresolved question which could prove fruitful with study. This, however, is hindered by the lack of suitable, systematically collected corpora. We present iWhisper-Mandarin, a 25-hour parallel corpus of neutral and whispered Mandarin, designed to support research in linguistics and speech technology. We demonstrate and verify that earlier techniques applied to whispered speech from non-tonal languages also work with Mandarin, and present some preliminary studies on voice activity detection and whispered Mandarin speech recognition.
Bibliographic reference. Lee, Pei Xuan / Wee, Darren / Toh, Hilary Si Yin / Lim, Boon Pang / Chen, Nancy F. / Ma, Bin (2014): "A whispered Mandarin corpus for speech technology applications", In INTERSPEECH-2014, 1598-1602.