EUROSPEECH 2003 - INTERSPEECH 2003
8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003

        

Maximum Likelihood Endpoint Detection with Time-Domain Features

Marco Orlandi, Alfiero Santarelli, Daniele Falavigna

ITCirst, Italy

In this paper we propose an effective, robust and computationally low-cost HMM-based start-endpoint detector for speech recognisers^1. Our first attempts follow the classical scheme feature extractor-Viterbi classifier (used for voice activity detection), followed by a post-processing stage, but the ultimate goal we pursue is a pure HMM-based architecture capable of performing the endpointing task. The features used for voice activity detection are energy and zero crossing rate, together with AMDF (Average Magnitude Difference Function), which proves to be a valid alternative to energy; further, we study the impact on performance of grammar structures and training conditions. In the end, we set the basis for the investigation of pure HMM-based architectures.

Full Paper

Bibliographic reference.  Orlandi, Marco / Santarelli, Alfiero / Falavigna, Daniele (2003): "Maximum likelihood endpoint detection with time-domain features", In EUROSPEECH-2003, 1757-1760.