F0 Contour Analysis Based on Empirical Mode Decomposition for DNN Acoustic Modeling in Mandarin Speech Recognition

Xiaoyun Wang, Xugang Lu, Hisashi Kawai, Seiichi Yamamoto


Tone information provides a strong distinction for many ambiguous characters in Mandarin Chinese. The use of tonal acoustic units and F0 related tonal features have been shown to be effective at improving the accuracy of Mandarin automatic speech recognition (ASR) systems, as F0 contains the most prominent tonal information for distinguishing words that are phonemically identical. Both long-term temporal intonations and short-term quick variations coexist in F0. Using untreated F0 as an acoustic feature renders the F0 contour patterns differently from their citation form and downplays the significance of tonal information in ASR. In this paper, we explore the empirical mode decomposition (EMD) on F0 contours to reconstruct F0 related tonal features with a view to removing the components that are irrelevant for Mandarin ASR.We investigate both GMM-HMM and DNN-HMM based acoustic modeling with the reconstructed tonal features. In comparison with the baseline systems using typical tonal features, our best system using reconstructed tonal features leads to a 4.5% relative word error rate reduction for the GMM-HMM system and a 3.5% relative word error rate reduction for the DNN-HMM system.


DOI: 10.21437/Interspeech.2016-653

Cite as

Wang, X., Lu, X., Kawai, H., Yamamoto, S. (2016) F0 Contour Analysis Based on Empirical Mode Decomposition for DNN Acoustic Modeling in Mandarin Speech Recognition. Proc. Interspeech 2016, 973-977.

Bibtex
@inproceedings{Wang+2016,
author={Xiaoyun Wang and Xugang Lu and Hisashi Kawai and Seiichi Yamamoto},
title={F0 Contour Analysis Based on Empirical Mode Decomposition for DNN Acoustic Modeling in Mandarin Speech Recognition},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-653},
url={http://dx.doi.org/10.21437/Interspeech.2016-653},
pages={973--977}
}