5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

The Effect of Fundamental Frequency on Mandarin Speech Recognition

Sharlene Liu (1), Sean Doyle (2), Allen Morris (3), Farzad Ehsani (4)

(1) Nuance, USA
(2) General Magic, USA
(3) Soft Gam, USA
(4) Sehda, USA

ABSTRACT We study the effects of modeling tone in Mandarin speech recognition. Including the neutral tone, there are 5 tones in Mandarin and these tones are syllable-level phenomena. A direct acoustic manifestation of tone is the fundamental frequency (f0). We will report on the effect of f0 on the acoustic recognition accuracy of a Mandarin recognizer. In particular, we put f0, its first derivative (f0'), and its second derivative (f0'') in separate streams of the feature vector. Stream weights are adjusted to investigate the individual effects of f0, f0', and f0'' to recognition accuracy. Our results show that incorporating the f0 feature negatively impacted accuracy, whereas f0' increased accuracy and f0'' seemed to have no effect.

