We describe the implementation of a cellular-phone based speech translation system without telephone quality speech database or special CT hardware. The purpose is to quickly build a prototype service system that can be used for data collection with real users. To train the acoustic model for the speech recognition system, available high-quality databases were made usable by 1.) appropriate downsampling and filtering of high-quality databases, and 2.) by piping, similar to the NTIMIT and CTIMIT paradigms. An evaluation of acoustic models with filtered, piped and real cellular-phone data is given. Recognition rates are at same levels as for wideband speech.
Cite as: Gruhn, R., Singer, H., Tsukada, H., Naito, M., Nishino, A., Nakamura, A., Sagisaka, Y., Nakamura, S. (2000) Cellular-phone based speech-to-speech translation system ATR-MATRIX. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 4, 448-451
@inproceedings{gruhn00_icslp, author={Rainer Gruhn and Harald Singer and Hajime Tsukada and Masaki Naito and Atsushi Nishino and Atsushi Nakamura and Yoshinori Sagisaka and Satoshi Nakamura}, title={{Cellular-phone based speech-to-speech translation system ATR-MATRIX}}, year=2000, booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)}, pages={vol. 4, 448-451} }