Hand Gesture Recognition Based on Decision Tree

Authors: Yiqiang CHEN, Wen GAO, Jiyong MA
Affiliation: Institute of Computing Technology, Chinese Academy of Sciences, Beijing


In this paper, we present a new technique for the recognition of hand gesture using decision tree method based on information entropy. Some rules are derived from the decision tree using training data, which can classify sixty-five different hand gestures. Normalization for all sensors in a DataGlove are also proposed to model the data variations of each sensor, which result from the same gesture variations. Compared with ANN, the proposed decision tree approach can not only improve the recognition performance by 12.2%, but also overcome the limitation of ANN in tedious training time.

An Example-based Approach for Prosody Generation in Chinese Speech synthesis

Authors: Minghui DONG, Kim Teng LUA
Affiliation: Department of Computer Science, School of Computing, National University of Singapore


Prosody generation is an important issue in text to speech system. We present in this paper an example-based approach for prosody generation in mandarin Chinese speech synthesis. The general idea is that we are trying to get the prosodic information from real speech examples. We first analyze
given Chinese text, and form a linguistic feature vector, which describes the phonetic and lexicon characteristics of the text. Then we search a database to find the best match of the vector, which is a similar occurrence of the text. The prosody parameters of the searched example will be the guideline of the prosody we are going to generate. The method is a hierarchical approach. The final prosody is the combination of three elements, which are syllable level prosody, phrase level prosody pattern and sentence level prosody constraint. The experimental results showed that the proposed method generates relatively good prosody.

Strategy and tactics on The Enhancement of Naturalness in Chinese TTS

Authors: Jianfen CAO, Shinan LU, Yufang YANG
Affiliation: Institute of Linguistics, Chinese Academy of Social Sciences, Beijing
Institute of Acoustics, Chinese Academy of Sciences, Beijing
Institute of Psychology Chinese Academy of Sciences, Beijing


This paper tries to make theoretical and descriptive contributions to the study on the enhancement of naturalness in Chinese TTS. The content includes a description on prosodic information of Chinese and some preliminary consideration respected to the strategy of prosodic information processing.

An Efficient Method To Synthesize Chinese Speech With Speaker Style

Authors: Zhong-ke MA, Wei LI, Deyu XIA, Ren-Hua WANG
Affiliation: Department of Electronic Engineering & Information Science,
University of Science & Technology of China


This paper introduces a corpus-based Chinese speech synthesis method, which can produce Chinese speech with the style of original speaker who records the corpus. There are two major problems in speech synthesis based on corpus. First, what contents should be kept in the corpus? Second, given a target sentence, how to select the synthesis units in corpus? Focusing on these two questions, we present our solution.

Experimental Evaluation of A Functional Modeling of Fundamental Frequency Contours of Standard Chinese Sentences

Authors: Jinfu NI, Keikichi HIROSE
Affiliation: Dept. of Information and Communication Engineering,
School of Engineering, University of Tokyo, Tokyo
Dept. of Frontier Informatics,School of Frontier Science,
University of Tokyo, Tokyo


In our previous report, a functional model of fundamental frequency(F0)contours of Chinese sentences was developed and was shown to be able to represent an observed F0contour well only from its peak values of consisting syllables. This paper evaluates the model especially from the viewpoint on the model parameter control in F0 contour generation. The results obtained through experiments on 2509 Chinese utterances produced by 8 native speakers indicated that model parameters can be categorized into 3 groups: (1) parameters independent to speakers and utterances, (2) a pair of parameters representing top and bottom values of voice register of a speaker, and (3) parameters tightly related to and thus conveying linguistic (and para-,non-linguistic) information of utterances. By representing F0 values as relative values in a register and further transposing them onto a warped scale, F0 contours for utterances of the same linguistic content but in diffirent frequency registers can be utilized together to investigate the third group parameters. Through analysis of 538 tri- and 938 tetra-syllable words, parameter controls in realizing 59 tri- and 221 tetra-tone sandhi patterns were decided.  Investigation was further conducted on the automatic detection of syllable F0 peaks with a total error rate of around 9.4% for 996 sentences.

Impact of Tone Information on Chinese Name Recognition

Authors: Yaxin ZHANG, Anton MEDIEVSKI, James LAWRENCE
Affiliation: Motorola China Research Center, Shanghai
Motorola Australian Research Center, Sydney


This paper describes a study on tone statistics of people’s names in Mandarin Chinese. We studied a Chinese name database consisting of 1.6 million names. The statistical analysis shows the potential for a problem with tone confusable names in a Mandarin voice tag dialing system. Two factors influence how serious the problem is: the length of the voice tag; and the vocabulary size of the system. We performed benchmark testing to compare recognition performance of an English version speech recognizer and a tone enhanced version, both working on a small database of Chinese names. From this analysis we conclude that enhancing an English version speech recognizer to add tonal recognition capabilities would improve recognition of Chinese names, reducing the recognition error rate by 10 – 20%.

