The F0 contour plays an important role in recognizing spoken tonal languages like Mandarin Chinese. However, the discontinuity of F0 between voiced and unvoiced transition has traditionally been a bottleneck in creating a succinct statistical tone model for automatic speech recognition applications. By applying successfully the Multi-Space Distribution (MSD) to tone modeling, we recently reported a relative 24% reduction of tonal syllable errors on a Mandarin speech database. In this paper, we test MSD further in a noisy, continuous Mandarin digit recognition task, where eight types of noises are added to clean speech signals at five SNRs. The experimental results show that our MSD-based digit models can significantly improve the recognition performance in noise over a baseline system. Relative digit error rate reductions of 19.1% and 15.0% are obtained for noises seen and unseen in the training data, respectively. The improvements are also better than other reference systems where F0 information is incorporated.
Bibliographic reference. Qiang, Sheng / Qian, Yao / Soong, Frank K. / Xu, Congfu (2007): "Robust F0 modeling for Mandarin speech recognition in noise", In INTERSPEECH-2007, 1801-1804.