7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Robust Speech Recognition Using Inter-Speaker and Intra-Speaker Adaptation

Baojie Li, Keikichi Hirose, Nobuaki Minematsu

University of Tokyo, USA

Inter-speaker variation can be coped rather well in speech recognition by speaker adaptation techniques such as MLLR and MAP. However, when dealing with speech other than reading style, such as conversational speech, emotional speech and so on, current recognition systems cannot achieve a satisfactory performance even after speaker adaptation. In view of this situation, two-level adaptation method was newly proposed, where adaptation technique was applied in two levels to handle inter-speaker and intra-speaker variations. A speaker independent model is first adapted to a specific speaker to generate a speaker dependent model. Then, after classifying the training data into several categories, the speaker dependent model is further adapted to each category using data classified to it (category dependent model). The recognition is done in parallel using the speaker dependent model and each category dependent model, and the result with highest likelihood is selected as the fi- nal recognition result. Recognition experiments were conducted for speech with various emotions (emotion of input speech is unknown), and the results showed that the proposed method outperformed the conventional MLLR-based speaker adaptation.

Full Paper

Bibliographic reference.  Li, Baojie / Hirose, Keikichi / Minematsu, Nobuaki (2002): "Robust speech recognition using inter-speaker and intra-speaker adaptation", In ICSLP-2002, 1397-1400.