ISCA Archive CHiME 2018
ISCA Archive CHiME 2018

The Toshiba entry to the CHiME 2018 Challenge

Rama Doddipatla, Takehiko Kagoshima, Cong-Thanh Do, Petko Petkov, Catalin-Tudor Zorila, Euihyun Kim, Daichi Hayakawa, Hiroshi Fujimura, Yannis Stylianou

This paper summarises the Toshiba entry to the single-array track of the CHiME 2018 speech recognition challenge. The system is based on conventional acoustic modelling (AM), where phonetic targets are tied to features at the frame-level, and use the provided tri-gram language model. The system is ranked in category A that focuses on acoustic robustness. Array signals are first enhanced using speaker dependent generalised eigenvalue (GEV) based beamforming. Two different acoustic representations are then extracted from the enhanced signals: i) log Mel filter-bank and ii) subband temporal envelope (STE) features. Separate acoustic models, trained on each set, are used for lattice combination. The AM combines convolutional and recurrent architectures in a single CNN-BLSTM model. Speaker adaptation, limited to vocal tract length normalisation (VTLN), de-reverberation and speaker suppression are also considered. Following system combination, the Toshiba entry achieves 60.8% word error rate (WER) on the development (dev) set and 56.5% WER on the evaluation (eval) set respectively. The system is ranked 4th in the A category.


doi: 10.21437/CHiME.2018-9

Cite as: Doddipatla, R., Kagoshima, T., Do, C.-T., Petkov, P., Zorila, C.-T., Kim, E., Hayakawa, D., Fujimura, H., Stylianou, Y. (2018) The Toshiba entry to the CHiME 2018 Challenge. Proc. 5th International Workshop on Speech Processing in Everyday Environments (CHiME 2018), 41-45, doi: 10.21437/CHiME.2018-9

@inproceedings{doddipatla18_chime,
  author={Rama Doddipatla and Takehiko Kagoshima and Cong-Thanh Do and Petko Petkov and Catalin-Tudor Zorila and Euihyun Kim and Daichi Hayakawa and Hiroshi Fujimura and Yannis Stylianou},
  title={{The Toshiba entry to the CHiME 2018 Challenge}},
  year=2018,
  booktitle={Proc. 5th International Workshop on Speech Processing in Everyday Environments (CHiME 2018)},
  pages={41--45},
  doi={10.21437/CHiME.2018-9}
}