10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Rapid Unsupervised Adaptation Using Frame Independent Output Probabilities of Gender and Context Independent Phoneme Models

Satoshi Kobashikawa, Atsunori Ogawa, Yoshikazu Yamaguchi, Satoshi Takahashi

NTT Corporation, Japan

Business is demanding higher recognition accuracy with no increase in computation time compared to previously adopted baseline speech recognition systems. Accuracy can be improved by adding a gender dependent acoustic model and unsupervised adaptation based on CMLLR (Constrained Maximum Likelihood Linear Regression). CMLLR-based batch-type unsupervised adaptation estimates a single global transformation matrix by utilizing prior unsupervised labeling, which unfortunately increases the computation time. Our proposed technique reduces prior gender selection and labeling time by using frame independent output probabilities of only gender dependent speech GMM (Gaussian Mixture Model) and context independent phoneme (monophone) HMM (Hidden Markov Model) in dual-gender acoustic models. The proposed technique further raises accuracy by employing a power term after adaptation. Simulations using spontaneous speech show that the proposed technique reduces computation time by 17.9% and the relative error in correct rate by 13.7% compared to the baseline without prior gender selection and unsupervised adaptation.

Full Paper

Bibliographic reference.  Kobashikawa, Satoshi / Ogawa, Atsunori / Yamaguchi, Yoshikazu / Takahashi, Satoshi (2009): "Rapid unsupervised adaptation using frame independent output probabilities of gender and context independent phoneme models", In INTERSPEECH-2009, 1615-1618.