September 22-25, 1997
In this paper we present a context dependent hybrid MMI-connectionist / Hidden Markov Model (HMM) speech recognition system for the Wall Street Journal (WSJ) database. The hybrid system is build with a neural network, which is used as a vector quantizer (VQ) and an HMM with discrete probablility density functions, which has the advantage of a faster decoding. The neural network is trained on an algorithm, that tries to maximize the mutual information between the classes of the input features (e.g. phones, triphones, etc.) and the neural firing sequence of the network. The system has been trained on the 1992 WSJ corpus (si-84). Tests were performed on the five- and twentythousand word, speaker independent (si_et) tasks. The error rates of a new context dependend neural network are 29% lower (relative) than the error rates of a standard (k-means) discrete system and the ratesare very close to the best continuous/semi-continuous HMM speech recognizers.
Bibliographic reference. Rottland, Jörg / Neukirchen, Christoph / Willett, Daniel / Rigoll, Gerhard (1997): "Large vocabulary speech recognition with context dependent MMI-connectionist / HMM systems using the WSJ database", In EUROSPEECH-1997, 79-82.