INTERSPEECH 2012
13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Discriminative Reranking for LVCSR Leveraging Invariant Structure

Masayuki Suzuki (1), Gakuto Kurata (2), Masafumi Nishimura (2), Nobuaki Minematsu (1)

(1) The University of Tokyo, Tokyo, Japan; (2) IBM Research - Tokyo, Kanagawa, Japan

An invariant structure is one of the long-span acoustic representations, where acoustic variations caused by non-linguistic factors are effectively removed from speech. We present in this paper a new method to leverage the invariant structures as features of discriminative reranking for Large Vocabulary Continuous Speech Recognition (LVCSR). First we use a traditional HMM-based LVCSR system to get a list of N-best candidates with phone alignments and construct an invariant structure for each candidate using its phone alignment. Here, the invariant structure is composed of lengths between every two phonemes in the candidate. Then we estimate a score of each phoneme-pair in the invariant structure, and rerank the N-best candidates using a weighted sum of the phoneme-pair scores, where the weights are trained discriminatively by averaged perceptron. Experimental results show a relative CER improvement of 6.69% over the baseline HMM-based LVCSR system.

Index Terms: Invariant Structure, LVCSR, Discriminative reranking

Full Paper

Bibliographic reference.  Suzuki, Masayuki / Kurata, Gakuto / Nishimura, Masafumi / Minematsu, Nobuaki (2012): "Discriminative reranking for LVCSR leveraging invariant structure", In INTERSPEECH-2012, 563-566.