Sixth International Conference on Spoken Language Processing (ICSLP 2000)

Beijing, China
October 16-20, 2000

Impact of Speaking Style and Speaking Task on Acoustic Models

Janienke Sturm (2), Hans Kamperman (2), Lou Boves (1,2), Els den Os (2)

(1) KPN Research, KPN Royal Dutch Telecom, Multi-Media Department, Leidschendam, The Netherlands
(2) Nijmegen University, The Netherlands

The loss in performance caused by mismatch between train and test material suggests a need for task specific acoustic models, especially for highly demanding tasks. However, since the training of these models is extremely expensive, general purpose models are more attractive. In this paper we address the impact of mismatch in speaking style and task. We trained three sets of acoustic models on data from different tasks, involving both read and extemporaneous speech. The average utterance length in the training corpora varied between 10.5 and 1.2 words. The models were tested on matched as well on very different tasks. The results suggest that general purpose models trained from short utterances are to be preferred in most spoken dialog systems. However, these models might not perform adequately in dictation tasks.


Full Paper

Bibliographic reference.  Sturm, Janienke / Kamperman, Hans / Boves, Lou / Os, Els den (2000): "Impact of speaking style and speaking task on acoustic models", In ICSLP-2000, vol.1, 361-364.

### i00_1365.html ICSLP-2000 Abstract: Kadambe / Burns

Sixth International Conference on Spoken Language Processing (ICSLP 2000)

Beijing, China
October 16-20, 2000

Encoded Speech Recognition Accuracy Improvement in adverse Environments by enhancing Formant Spectral bands

Shubha Kadambe, Ron Burns

HRL Laboratories, LLC, Malibu, CA, USA

Spoken dialogue information retrieval applications are the future trend for mobile users in automobiles, on cellular phones, etc. Due to the limitation of resources in these platforms, it may be advantageous to extract speech features, and compress and transmit them to a central hub where the computation intensive tasks such as speech recognition and speech understanding, etc. can be performed. Generally, the speech recognition accuracy degrades when the decoded speech signal (that is obtained after re-synthesizing the signal from the compressed features) is used. In addition, the background noise that is present in the above mentioned mobile systems will reduce the recognition accuracy. Therefore, in order to improve the recognition accuracy it is essential to extract robust features that can jointly optimize compression and recognition. In this paper, we describe a technique that improves the recognition accuracy of noisy encoded speech signals by performing spectral correction and spectral formant band enhancement before synthesizing the speech signal from the compressed features. We have conducted experiments on 1831 telephone speech utterances from 1831 speakers. We added (a) the invehicle noise recorded from a Volvo car moving on an asphalt road at 134 kmph, (b) the factory noise recorded in a factory and (c) the speech (babble) noise recorded in a cafeteria to these utterances at various signal-to-noise ratios (SNR). Our experimental results indicate recognition accuracy improvement up to 10% at 0 dB SNR.


Full Paper

Bibliographic reference.  Kadambe, Shubha / Burns, Ron (2000): "Encoded speech recognition accuracy improvement in adverse environments by enhancing formant spectral bands", In ICSLP-2000, vol.1, 365-368.