INTERSPEECH 2011
12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Continuous Digits Recognition Leveraging Invariant Structure

Masayuki Suzuki (1), Gakuto Kurata (2), Masafumi Nishimura (2), Nobuaki Minematsu (1)

(1) University of Tokyo, Japan
(2) IBM Research - Tokyo, Japan

Recently, an invariant structure of speech was proposed, where the inevitable acoustic variations caused by non-linguistic factors are effectively removed from speech. The invariant structure was applied to isolated word recognition and the experimental results showed good performance. However, the previous method can't apply to continuous speech recognition directly because there was no efficient decoding algorithm. In this paper, we propose a method to leverage the invariant structure in continuous digits recognition. We use a traditional HMM-based Automatic Speech Recognition (ASR) system to get N-best lists with phone alignments. Then we construct invariant structures using these phone alignments and re-rank the N-best lists by investigating which hypothesis is structurally more valid. Experimental results show a relative WER improvement of 17.4% over the baseline HMM-based ASR system.

Full Paper

Bibliographic reference.  Suzuki, Masayuki / Kurata, Gakuto / Nishimura, Masafumi / Minematsu, Nobuaki (2011): "Continuous digits recognition leveraging invariant structure", In INTERSPEECH-2011, 993-996.