ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

Incorporating acoustical modelling of phone transitions in an hybrid ANN/HMM speech recognizer

Alberto Abad, João Neto

Speech recognition based on connectionist approaches is one of the most successful alternatives to widespread Gaussian systems. One of the main claims against hybrid recognizers is the increased complexity for context-dependent phone modelling, which is a key aspect in medium to large size vocabulary tasks. In this paper, a baseline hybrid system based on monophone recognition units is improved by incorporating acoustical modelling of phone transitions. First, a single state monophone model is extended to multiple state sub-phoneme modelling. Then, a reduced set of diphone recognition units is incorporated to model phone transitions. The proposed approach shows a 26.8% and 23.8% relative word error rate reduction compared to baseline hybrid system in two selected WSJ evaluation test sets. Additionally, improved performance compared to a reference Gaussian system based on word-internal context-dependent triphones and comparable results to cross-word triphone system are reported.


doi: 10.21437/Interspeech.2008-127

Cite as: Abad, A., Neto, J. (2008) Incorporating acoustical modelling of phone transitions in an hybrid ANN/HMM speech recognizer. Proc. Interspeech 2008, 2394-2397, doi: 10.21437/Interspeech.2008-127

@inproceedings{abad08_interspeech,
  author={Alberto Abad and João Neto},
  title={{Incorporating acoustical modelling of phone transitions in an hybrid ANN/HMM speech recognizer}},
  year=2008,
  booktitle={Proc. Interspeech 2008},
  pages={2394--2397},
  doi={10.21437/Interspeech.2008-127}
}