EUROSPEECH 2003 - INTERSPEECH 2003
In this paper we propose an effective, robust and computationally low-cost HMM-based start-endpoint detector for speech recognisers^1. Our first attempts follow the classical scheme feature extractor-Viterbi classifier (used for voice activity detection), followed by a post-processing stage, but the ultimate goal we pursue is a pure HMM-based architecture capable of performing the endpointing task. The features used for voice activity detection are energy and zero crossing rate, together with AMDF (Average Magnitude Difference Function), which proves to be a valid alternative to energy; further, we study the impact on performance of grammar structures and training conditions. In the end, we set the basis for the investigation of pure HMM-based architectures.
Bibliographic reference. Orlandi, Marco / Santarelli, Alfiero / Falavigna, Daniele (2003): "Maximum likelihood endpoint detection with time-domain features", In EUROSPEECH-2003, 1757-1760.