8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Speech Recognition for Multiple Non-Native Accent Groups with Speaker-Group-Dependent Acoustic Models

Tobias Cincarek, Rainer Gruhn, Satoshi Nakamura

Advanced Telecommunications Research International, Japan

In this paper, the recognition performance for non-native English speech with two different kinds of speaker-group-dependent acoustic models is investigated. The approaches for creating speaker groups include knowledge-based grouping of non-native speakers by their first language, and the automatic clustering of speakers. Clustering is based on speaker-dependent acoustic models in speaker Eigenspace. The acoustic model for each speaker group is obtained by bootstrapping with pre-segmented speech data or adaptation of a speaker-independent native baseline model. For the decoding of a non-native speaker's utterance not seen during the training or adaptation phase, the selection of a model suitable to cope with the accent characteristics of that speaker is necessary. Here, ideal selection via an oracle and parallel decoding are examined. Evaluation is conducted in a hotel reservation task for five major accent groups, including German, French, Indonesian, Chinese and Japanese speakers. Recognition results with speaker-dependent and an accent-independent non-native model will also be reported.

Full Paper

Bibliographic reference.  Cincarek, Tobias / Gruhn, Rainer / Nakamura, Satoshi (2004): "Speech recognition for multiple non-native accent groups with speaker-group-dependent acoustic models", In INTERSPEECH-2004, 1509-1512.