Sixth European Conference on Speech Communication and Technology
In this paper, a new RNN-based prosodic modeling method for Mandarin speech recognition is proposed. It is performed in the post-processing stage of the acoustic decoding aiming at detecting word boundaries for assisting in the lexical decoding. It employs a simple RNN to learn the relationship between input prosodic features, extracted from the input utterance with syllable boundaries provided by the preceding acoustic decoding, and output information related to word boundaries. Simulations on a large single-speaker database were performed to evaluate the proposed method. Experimental results showed that 71.9% of word tags and 95.3% of punctuation mark (PM) tags could be correctly detected. By incorporating the prosodic model into an HMM-based continuous Mandarin speech recognition system, the character recognition rate increased from 73.6% to 74.7% with a reduction of 17% on the computational complexity. So the proposed prosodic modeling method is helpful for speech recognition.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Wang, Wern-Jun / Liao, Yuan-Fu / Chen, Sin-Horng (1999): "Prosodic modeling of Mandarin speech and its application to lexical decoding", In EUROSPEECH'99, 743-746.