7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Sharing Relative Stress of Cross-Word Syllables and Lexical Stress to Spontaneous Speech Recognition

Farshad Almasganj (1), Farhad D. Dehnavi (2), Mahmood Bijankhan (3)

(1) Amirkabir University, Iran; (2) Tehran University, Iran; (3) Research Center of Intelligent Signal Processing, Iran

Prosody is a suprasegmental feature of speech that has an undeniable role in human speech perception and generation. However, employing of prosodic features in CSR process mostly is difficult and we must not expect huge accuracy progress by using them. In this way, the main problem arises from high dependency of prosodic patterns to factors like speakers, psychological state of speakers and superposition effects of higher-level prosodic patterns on lower level of them. In our approach, the selected microprosodic feature case is the lexical word stress pattern and relative stresses of crossword syllables. We aim to verify if we succeed to present proper models for the prosodic feature recognition purpose, we can use them to modify speech recognition process. We employed a proper neural network approach to the word and cross-word stress recognition task. Then we incorporated these features into a spontaneous Farsi speech recognition system called SHENAVA-1. We found 1.3% better word accuracy.


Full Paper

Bibliographic reference.  Almasganj, Farshad / Dehnavi, Farhad D. / Bijankhan, Mahmood (2002): "Sharing relative stress of cross-word syllables and lexical stress to spontaneous speech recognition", In ICSLP-2002, 945-948.