12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Identifying Regions of Non-Modal Phonation Using Features of the Wavelet Transform

John Kane, Christer Gobl

Trinity College Dublin, Ireland

The present study proposes a new parameter for identifying breathy to tense voice qualities in a given speech segment using measurements from the wavelet transform. Techniques that can deliver robust information on the voice quality of a speech segment are desirable as they can help tune analysis strategies as well as provide automatic voice quality annotation in large corpora. The method described here involves wavelet-based decomposition of the speech signal into octave bands and then fitting a regression line to the maximum amplitudes at the different scales. The slope coefficient is then evaluated in terms of its ability to differentiate voice qualities compared to other parameters in the literature. The new parameter (named here Peak Slope) was shown to have robustness to babble noise added with signal to noise ratios as low as 10 dB. Furthermore, the proposed parameter was shown to provide better differentiation of breathy to tense voice qualities in both vowels and running speech.

Full Paper

Bibliographic reference.  Kane, John / Gobl, Christer (2011): "Identifying regions of non-modal phonation using features of the wavelet transform", In INTERSPEECH-2011, 177-180.