ISCA Archive SpeechProsody 2008
ISCA Archive SpeechProsody 2008

Detecting non-modal phonation in telephone speech

Tae-Jin Yoon, Jennifer Cole, Mark Hasegawa-Johnson

Non-modal phonation conveys both linguistic and paralinguistic information, and is distinguished by acoustic source and filter features. Detecting non-modal phonation in speech requires reliable F0 analysis, a problem for telephone-band speech, where F0 analysis frequently fails. We demonstrate an approach to the detection of creaky phonation in telephone speech based on robust F0 and spectral analysis. Our F0 analysis relies on an autocorrelation algorithm applied to the intensity-boosted and inverse-filtered speech signal and succeeds in regions of nonmodal phonation where the non-filtered F0 analysis typically fails. In addition to the extracted F0 values, spectral amplitude is measured at the first two harmonics (H1, H2) and the first three formants (A1, A2, A3). Visual and spectral inspection of the detected creaky phonation confirms the findings reported from laboratory setting. Statistical analysis using oneway ANOVA and classification using Support Vector Machine (SVM) reveals promising results which lead to further improvement for automatic detection of non-modal phonation in telephone speech.

Cite as: Yoon, T.-J., Cole, J., Hasegawa-Johnson, M. (2008) Detecting non-modal phonation in telephone speech. Proc. Speech Prosody 2008, 33-36

  author={Tae-Jin Yoon and Jennifer Cole and Mark Hasegawa-Johnson},
  title={{Detecting non-modal phonation in telephone speech}},
  booktitle={Proc. Speech Prosody 2008},