International Workshop on Spoken Language Translation (IWSLT) 2004
Keihanna Science City, Kyoto, Japan
This paper describes the speech recognition module of the speech-to-speech translation system being currently developed at ATR. It is a multi-lingual large vocabulary continuous speech recognition system supporting Japanese, English and Chinese languages. A corpusbased statistical approach was adopted for the system design. The database we collected consists of more than 600 000 sentences covering broad range of travel related conversations in each of the three languages. The recognition system is based on language-dependent acoustic and language models, and pronunciation dictionaries. The models are built using the latest training methods developed at ATR as the Minimum Description Length Successive State Splitting (MDL-SSS) and Multi-dimensional Composite N-gram techniques. The specifics of each language are taken into account in order to achieve high recognition performance. The speech recognition system is under constant improvement and enhancement, and although the models for the different languages are at different development stages, the recent evaluation experiments showed that the recognition performance is above 92% for every language.
Full Paper Presentation
Bibliographic reference. Nakamura, Satoshi / Markov, Konstantin / Jitsuhiro, Takatoshi / Zhang, Jin-Song / Yamamoto, Hirofumi / Kikui, Genichiro (2004): "Multi-lingual speech recognition system for speech-to-speech translation", In IWSLT-2004, 147-154.