12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Measuring Final Lengthening for Speaker-Change Prediction

Anna Hjalmarsson, Kornel Laskowski

KTH, Sweden

We explore pre-silence syllabic lengthening as a cue for nextspeakership prediction in spontaneous dialogue. When estimated using a transcription-mediated procedure, lengthening is shown to reduce error rates by 25% relative to majority class guessing. Lengthening should therefore be exploited by dialogue systems. With that in mind, we evaluate an automatic measure of spectral envelope change, Mel-spectral flux (MSF), and show that its speaker-independent performance is at least as good as that of the transcription-mediated measure. Modeling MSF is likely to improve turn uptake in dialogue systems, and to benefit other applications needing an estimate of durational variability in speech.

Full Paper

Bibliographic reference.  Hjalmarsson, Anna / Laskowski, Kornel (2011): "Measuring final lengthening for speaker-change prediction", In INTERSPEECH-2011, 2065-2068.