ISCA Archive ICSLP 1994
ISCA Archive ICSLP 1994

Tempo estimation by wave envelope for recognition of paralinguistic features in spontaneous speech

Shigeyoshi Kitazawa, Satoshi Kobayashi, Takao Matsunaga, Hideya Ichikawa

We analyze speech rate through an envelope extraction process. The process is low-pass filtering of rectified speech wave to eliminate ripples caused from pitch and vocal resonances. Speech wave is amplitude modulated about 8 mora/sec. Dips of the envelope correspond to consonants or phonemic boundaries, therefore dips within a unit time is correlated with the rate of speech. We measured the rate of speech from an interviewing between a female interviewer and a male interviewee. Speech data analysed consists of 7 utterances of the man and 6 utterances of the lady with durations of 2 to 7 seconds. Same utterances were labeled manually for locations of individual phonemes. Manually computed rate excluding pauses is faster than averaged one. By DFT of the envelope, a frequency component of the rate of speech is avilable and have shown to be correlated with the manual rate at the coefficient of 0.57.


Cite as: Kitazawa, S., Kobayashi, S., Matsunaga, T., Ichikawa, H. (1994) Tempo estimation by wave envelope for recognition of paralinguistic features in spontaneous speech. Proc. 3rd International Conference on Spoken Language Processing (ICSLP 1994), 1691-1694

@inproceedings{kitazawa94_icslp,
  author={Shigeyoshi Kitazawa and Satoshi Kobayashi and Takao Matsunaga and Hideya Ichikawa},
  title={{Tempo estimation by wave envelope for recognition of paralinguistic features in spontaneous speech}},
  year=1994,
  booktitle={Proc. 3rd International Conference on Spoken Language Processing (ICSLP 1994)},
  pages={1691--1694}
}