The paper presents an empirical model of emphatic word detection, as an alternative to conventional machine-learning-based methods. The model is based on the Probabilistic Amplitude Demodulation (PAD) that is iteratively applied for getting syllable and stress modulations, i.e., using the cascaded PAD method. The emphatic words are detected by prominent peaks of the stress modulation and by considering the peaks that are stressed or accented. The cascaded demodulation steered with general purpose values derived from 200ms long average syllable duration, yields to detection accuracy of 81%-83%. Speaker-dependent cascaded demodulation, considering specific speaking rate of the speakers, yields to detection accuracy of 86%-91%. The advantages of the proposed empirical detection model are (i) noise-robustness, (ii) language-independence and (iii) it does not require a training phase.
Bibliographic reference. Cernak, Milos / Honnet, Pierre-Edouard (2015): "An empirical model of emphatic word detection", In INTERSPEECH-2015, 573-577.