September 22-25, 1997
Continuous Speech Recognition Systems (CSR) usually include large sets of context dependent units to model contextual variations in the pronunciation of phones. The goal of this work was to obtain adequate sets of sub-lexical models by using acoustic information but excluding any previous phonological knowledge. At each iteration of a classical Viterbi training scheme each acoustic model was split into a set of more accurate models. This approach was evaluated over a Spanish acoustic phonetic decoding task. The experimental results showed that this approach produces similar recognition rates than classical triphones.
Bibliographic reference. Rodriguez, Luis Javier / Torres, Ines M. (1997): "Viterbi based splitting of phoneme HMM's", In EUROSPEECH-1997, 1211-1214.