8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Use of Syllable Center Detection for Improved Duration Modeling in Chinese Mandarin Connected Digits Recognition

Sergey Astrov, Joachim Hofer, Harald Höge

Siemens AG, Germany

&# 9;This paper describes practical approaches for improving Mandarin digit recognition accuracy, especially in cars. We consider syllable and subword unit durations as additional source of information. The explored approach was realized in two stages. First, the system performs standard speech recognition using acoustic spectral features. As a result, an n-best list of hypotheses is generated. In the second stage the hypothesis probabilities are re-estimated using duration models, thus, the hypotheses are reordered such that the correct ones are pushed to the top of the n-best list. In such a way the word error rate (WER) is reduced. We explore state of the art approach of duration n-grams. In order to eliminate the influence of speech rate variations, the durations are normalized to a relative speech rate, a 10% relative reduction of WER was achieved. A novel approach led to 13.3% WER reduction: the durations were normalized to a syllable rate obtained from the syllable center detector.

