INTERSPEECH 2012

An invariant structure is one of the longspan acoustic representations, where acoustic variations caused by nonlinguistic factors are effectively removed from speech. We present in this paper a new method to leverage the invariant structures as features of discriminative reranking for Large Vocabulary Continuous Speech Recognition (LVCSR). First we use a traditional HMMbased LVCSR system to get a list of Nbest candidates with phone alignments and construct an invariant structure for each candidate using its phone alignment. Here, the invariant structure is composed of lengths between every two phonemes in the candidate. Then we estimate a score of each phonemepair in the invariant structure, and rerank the Nbest candidates using a weighted sum of the phonemepair scores, where the weights are trained discriminatively by averaged perceptron. Experimental results show a relative CER improvement of 6.69% over the baseline HMMbased LVCSR system.
Index Terms: Invariant Structure, LVCSR, Discriminative reranking
Bibliographic reference. Suzuki, Masayuki / Kurata, Gakuto / Nishimura, Masafumi / Minematsu, Nobuaki (2012): "Discriminative reranking for LVCSR leveraging invariant structure", In INTERSPEECH2012, 563566.