13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Speech Data Clustering Based on Phoneme Error Trend for Unsupervised Acoustic Model Adaptation

Taichi Asami, Satoshi Kobashikawa, Hirokazu Masataki, Osamu Yoshioka, Satoshi Takahashi

NTT Cyber Space Laboratories, NTT Corporation, Japan

Unsupervised cluster adaptive training of acoustic models offers promise in improving recognition accuracy, especially for speech recognition systems that store massive sets of speech samples from unknown people. How to classify the variety of acoustic characteristics is an important problem in adaptation sample clustering. We propose a novel speech sample clustering method that focuses on the phoneme error trend in each speech sample. The proposed method classifies adaptation samples in terms of the trend of phoneme discrimination in each sample, and represents each sample as a compact phoneme error trend vector whose dimension is at most the number of phonemes. Experiments illustrate that the phoneme error trend vectors have enough expressiveness to classify acoustic characteristics effectively, and are compact enough to provide robustness against unknown data.

Index Terms: speech recognition, acoustic model adaptation, data clustering, phoneme error trend

Full Paper

Bibliographic reference.  Asami, Taichi / Kobashikawa, Satoshi / Masataki, Hirokazu / Yoshioka, Osamu / Takahashi, Satoshi (2012): "Speech data clustering based on phoneme error trend for unsupervised acoustic model adaptation", In INTERSPEECH-2012, 1760-1763.