5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

Incorporation of HMM Output Constraints In Hybrid NN/HMM Systems During Training

Mike Schuster

ATR, Interpreting Telecommunications Research Lab., Seika-cho, Soraku-gun, Kyoto, Japan

This paper describes a method to incorporate the HMM output constraints in frame based hybrid NN/HMM systems during training. While usually the NN parameters are adjusted to maximize the cross-entropy between the frame target probabilities and the network predictions assuming statistically independent outputs in time, in the approach described here the full likelihood of the utterance(s) using also the HMM output constraints, like for conventional HMM systems, is maximized. This is achieved by maximizing the state occupation probabilities after a forward/backward pass using the scaled likelihoods coming from the network. Making a simplifying approximation for the derivative for the back-propagation through the forward/backward pass, tests show that the proposed method gives consistently higher string (phoneme) recognition rates than the conventional approach that aims at maximizing cross-entropy at the frame level.

Full Paper

Bibliographic reference.  Schuster, Mike (1997): "Incorporation of HMM output constraints in hybrid NN/HMM systems during training", In EUROSPEECH-1997, 2843-2846.