Recently, an invariant structure of speech was proposed, where the inevitable acoustic variations caused by non-linguistic factors are effectively removed from speech. The invariant structure was applied to isolated word recognition and the experimental results showed good performance. However, the previous method can't apply to continuous speech recognition directly because there was no efficient decoding algorithm. In this paper, we propose a method to leverage the invariant structure in continuous digits recognition. We use a traditional HMM-based Automatic Speech Recognition (ASR) system to get N-best lists with phone alignments. Then we construct invariant structures using these phone alignments and re-rank the N-best lists by investigating which hypothesis is structurally more valid. Experimental results show a relative WER improvement of 17.4% over the baseline HMM-based ASR system.
Bibliographic reference. Suzuki, Masayuki / Kurata, Gakuto / Nishimura, Masafumi / Minematsu, Nobuaki (2011): "Continuous digits recognition leveraging invariant structure", In INTERSPEECH-2011, 993-996.