EUROSPEECH 2003 - INTERSPEECH 2003
This paper presents a novel method for tone pattern discrimination derived by combining a functional fundamental frequency (F_0) model for feature extraction with vector quantization and maximum likelihood estimation techniques. Tone patterns are represented in a parametric form based on the F_0 model and clustered using the LBG algorithm. The mapping between lexical tones and acoustic patterns is statistically modeled and decoded by the maximum likelihood estimation. Evaluation experiments are conducted on 469 Mandarin utterances (1.4 hours of read speech from a female native) with varied analysis conditions of codebook sizes and tone contexts. Experimental results indicate the effectiveness of the method in both tone discrimination and detection of the inconsistency between a lexical tone and its F_0 pattern. The method is suitable for the prosodic labeling of a large scale speech corpus.
Bibliographic reference. Ni, Jinfu / Kawai, Hisashi (2003): "Tone pattern discrimination combining parametric modeling and maximum likelihood estimation", In EUROSPEECH-2003, 465-468.