Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Fast and Effective Retraining on Contrastive Vocal Characteristics with Bidirectional Long Short-Term Memory Nets

Nicole Beringer

3SOFT GmbH, Germany

We apply Long Short-Term Memory (LSTM) recurrent neural networks to a large corpus of unprompted speech - the German part of the VERBMOBIL corpus. By training first on a fraction of the data, then retraining on another fraction, we both reduce time costs and significantly improve recognition rates. Contrastive retraining on the initial vowel cluster fraction of the data according to the Psycho- Computational Model of Sound Acquisition (PCMSA) shows higher frame by frame correctness due to more sparseness and the articulatory position of the sounds. For comparison we show recognition rates of Hidden Markov Models (HMMs) on the same corpus, and provide a promising extrapolation for HMM-LSTM hybrids.

Full Paper

Bibliographic reference.  Beringer, Nicole (2006): "Fast and effective retraining on contrastive vocal characteristics with bidirectional long short-term memory nets", In INTERSPEECH-2006, paper 1602-Mon3CaP.8.