16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

An Empirical Model of Emphatic Word Detection

Milos Cernak, Pierre-Edouard Honnet

Idiap Research Institute, Switzerland

The paper presents an empirical model of emphatic word detection, as an alternative to conventional machine-learning-based methods. The model is based on the Probabilistic Amplitude Demodulation (PAD) that is iteratively applied for getting syllable and stress modulations, i.e., using the cascaded PAD method. The emphatic words are detected by prominent peaks of the stress modulation and by considering the peaks that are stressed or accented. The cascaded demodulation steered with general purpose values derived from 200ms long average syllable duration, yields to detection accuracy of 81%-83%. Speaker-dependent cascaded demodulation, considering specific speaking rate of the speakers, yields to detection accuracy of 86%-91%. The advantages of the proposed empirical detection model are (i) noise-robustness, (ii) language-independence and (iii) it does not require a training phase.

Full Paper

Bibliographic reference.  Cernak, Milos / Honnet, Pierre-Edouard (2015): "An empirical model of emphatic word detection", In INTERSPEECH-2015, 573-577.