![]() |
ISCA ArchiveInternational Symposium on Chinese Spoken Language Processing (ISCSLP 2000)Fragrant Hill Hotel, Beijing
|
![]() |
ABSTRACT
In this paper, we present a new technique for the recognition of hand gesture using decision tree method based on information entropy. Some rules are derived from the decision tree using training data, which can classify sixty-five different hand gestures. Normalization for all sensors in a DataGlove are also proposed to model the data variations of each sensor, which result from the same gesture variations. Compared with ANN, the proposed decision tree approach can not only improve the recognition performance by 12.2%, but also overcome the limitation of ANN in tedious training time.
ABSTRACT
Prosody generation is an important issue in text to speech
system. We present in this paper an example-based approach for prosody generation in
mandarin Chinese speech synthesis. The general idea is that we are trying to get the
prosodic information from real speech examples. We first analyze
given Chinese text, and form a linguistic feature vector, which describes the phonetic and
lexicon characteristics of the text. Then we search a database to find the best match of
the vector, which is a similar occurrence of the text. The prosody parameters of the
searched example will be the guideline of the prosody we are going to generate. The method
is a hierarchical approach. The final prosody is the combination of three elements, which
are syllable level prosody, phrase level prosody pattern and sentence level prosody
constraint. The experimental results showed that the proposed method generates relatively
good prosody.
ABSTRACT
This paper tries to make theoretical and descriptive contributions to the study on the enhancement of naturalness in Chinese TTS. The content includes a description on prosodic information of Chinese and some preliminary consideration respected to the strategy of prosodic information processing.
ABSTRACT
This paper introduces a corpus-based Chinese speech synthesis method, which can produce Chinese speech with the style of original speaker who records the corpus. There are two major problems in speech synthesis based on corpus. First, what contents should be kept in the corpus? Second, given a target sentence, how to select the synthesis units in corpus? Focusing on these two questions, we present our solution.
ABSTRACT
In our previous report, a functional model of fundamental frequency(F0)contours of Chinese sentences was developed and was shown to be able to represent an observed F0contour well only from its peak values of consisting syllables. This paper evaluates the model especially from the viewpoint on the model parameter control in F0 contour generation. The results obtained through experiments on 2509 Chinese utterances produced by 8 native speakers indicated that model parameters can be categorized into 3 groups: (1) parameters independent to speakers and utterances, (2) a pair of parameters representing top and bottom values of voice register of a speaker, and (3) parameters tightly related to and thus conveying linguistic (and para-,non-linguistic) information of utterances. By representing F0 values as relative values in a register and further transposing them onto a warped scale, F0 contours for utterances of the same linguistic content but in diffirent frequency registers can be utilized together to investigate the third group parameters. Through analysis of 538 tri- and 938 tetra-syllable words, parameter controls in realizing 59 tri- and 221 tetra-tone sandhi patterns were decided. Investigation was further conducted on the automatic detection of syllable F0 peaks with a total error rate of around 9.4% for 996 sentences.
ABSTRACT
This paper describes a study on tone statistics of peoples names in Mandarin Chinese. We studied a Chinese name database consisting of 1.6 million names. The statistical analysis shows the potential for a problem with tone confusable names in a Mandarin voice tag dialing system. Two factors influence how serious the problem is: the length of the voice tag; and the vocabulary size of the system. We performed benchmark testing to compare recognition performance of an English version speech recognizer and a tone enhanced version, both working on a small database of Chinese names. From this analysis we conclude that enhancing an English version speech recognizer to add tonal recognition capabilities would improve recognition of Chinese names, reducing the recognition error rate by 10 20%.