8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Robust F0 Modeling for Mandarin Speech Recognition in Noise

Sheng Qiang (1), Yao Qian (2), Frank K. Soong (2), Congfu Xu (1)

(1) Zhejiang University, China
(2) Microsoft Research Asia, China

The F0 contour plays an important role in recognizing spoken tonal languages like Mandarin Chinese. However, the discontinuity of F0 between voiced and unvoiced transition has traditionally been a bottleneck in creating a succinct statistical tone model for automatic speech recognition applications. By applying successfully the Multi-Space Distribution (MSD) to tone modeling, we recently reported a relative 24% reduction of tonal syllable errors on a Mandarin speech database. In this paper, we test MSD further in a noisy, continuous Mandarin digit recognition task, where eight types of noises are added to clean speech signals at five SNRs. The experimental results show that our MSD-based digit models can significantly improve the recognition performance in noise over a baseline system. Relative digit error rate reductions of 19.1% and 15.0% are obtained for noises seen and unseen in the training data, respectively. The improvements are also better than other reference systems where F0 information is incorporated.

Full Paper

Bibliographic reference.  Qiang, Sheng / Qian, Yao / Soong, Frank K. / Xu, Congfu (2007): "Robust F0 modeling for Mandarin speech recognition in noise", In INTERSPEECH-2007, 1801-1804.