12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

A Template Based Voice Trigger System Using Bhattacharyya Edit Distance

Evelyn Kurniawati, Samsudin Ng, Karthik Muralidhar, Sapna George

STMicroelectronics Asia Pacific Pte. Ltd., Singapore

Dynamic Time Warping (DTW) is frequently used in isolated word recognition system due to their simplicity and robustness to noise. However, the computational effort required by DTW based solution is proportional to the number of words registered in the system. Vector Quantization (VQ) is employed to alleviate this by converting the spoken input to a sequence of discrete symbols to be matched with the stored word template. In this paper, we propose the use of Bhattacharyya distance as the cost function for this pattern matching problem. The template used is a string of discrete symbols, each modeled by Gaussian Mixture Model (GMM) representing context dependent sub-word unit. The system is tested on 100 template matching task from two registrations of 50 cable TV channel names to simulate voice-triggered remote control. An average of 92% accuracy is obtained. A scheme is also proposed to enable guest user without registration data to use the system efficiently.

Full Paper

Bibliographic reference.  Kurniawati, Evelyn / Ng, Samsudin / Muralidhar, Karthik / George, Sapna (2011): "A template based voice trigger system using bhattacharyya edit distance", In INTERSPEECH-2011, 889-892.