Speech recognition based on connectionist approaches is one of the most successful alternatives to widespread Gaussian systems. One of the main claims against hybrid recognizers is the increased complexity for context-dependent phone modeling, which is a key aspect in medium to large size vocabulary tasks. In this paper, we investigate the use of context-dependent triphone models in a connectionist speech recognizer. Thus, most common triphone state clustering procedures for Gaussian models are compared and applied to our hybrid recognizer. The developed systems with clustered context-dependent triphones show above 20% relative word error rate reduction compared to a baseline hybrid system in two selected WSJ evaluation test sets. Additionally, the recent porting efforts of the proposed context modelling approaches to a LVCSR system for English Broadcast News transcription are reported.
Bibliographic reference. Abad, Alberto / Pellegrini, Thomas / Trancoso, Isabel / Neto, João (2010): "Context dependent modelling approaches for hybrid speech recognizers", In INTERSPEECH-2010, 2950-2953.