9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Speech/Laughter Classification in Meeting Audio

Swe Zin Kalayar Khine, Tin Lay Nwe, Haizhou Li

Institute for Infocomm Research, Singapore

In this paper, harmonicity information is incorporated into acoustic features to detect laughter segments and speech segments. We implement our system using HMM (Hidden Markov Models) classifier trained on Pitch and Harmonic Frequency Scale based subband filters (PHFS). Harmonicity of the signal can be determined by variation of the pitch and harmonics. The cascaded subband filters are used to spread in pitch and harmonicity frequency scale to describe the harmonicity information. The pitch bandwidth of the first layer spans from 80 Hz to 300 Hz and the entire band spans 80 Hz-8 kHz. The experiments are conducted on ICSI meeting corpus (BMR and Bed). We achieve an average error rate of 0.84% for 'BMR' meeting and 3.64% for 'BED' meeting in segment level speech and laughter detection. The results show that the proposed Pitch and Harmonic Frequency Scale (PHFS) based feature is robust and effective.

Full Paper

Bibliographic reference.  Khine, Swe Zin Kalayar / Nwe, Tin Lay / Li, Haizhou (2008): "Speech/laughter classification in meeting audio", In INTERSPEECH-2008, 793-796.