ISCA Archive

International Symposium on Chinese Spoken Language Processing (ISCSLP 2000)

Fragrant Hill Hotel, Beijing
October 13-15, 2000

Session Oral 4

A Self adapting Endpoint Detection Algorithm for Speech Recognition in Noisy Environment Based on 1/f Process

Authors: Fan WANG, Fang ZHENG, Wenhu WU
Affiliation: Center of Speech Technology, State Key Laboratory of Intelligent Technology and Systems,
Department of Computer Science & Technology, Tsinghua University, Beijing


This paper presents an effective and robust speech endpoint detection method based on 1/f process technique, which is suitable for robust continuous speech recognition system in variable noisy environments. The Gaussian 1/f process, which is a mathematical model for statistically self-similar random processes from fractals, is selected to model both speech and background noise. Then, an optimal Bayesian two-class classifier is developed to discriminate between real noisy speech and background noise by the wavelet coefficients with Karhunen-Loeve-type properties of the 1/f processes. Finally, for robust requirement, a few templates are built for speech and the parameters of the background noise can be dynamically  adapted in runtime to deal with the variation of both speech and noise. In our experiments, 10 minutes long speech with different types of noises was tested using this new endpoint detector. A high performance with over 90% detection accuracy was achieved.

Page 327

Online Unsupervised Learning of HMM Parameters for Speaker Adaptation

Authors: Jen-Tzung CHIEN
Affiliation: Department of Computer Science and Information Engineering,
National Cheng Kung University, Tainan


This paper presents an online unsupervised learning algorithm to flexibly adapt the speaker-independent (SI) hidden Markov models (HMM’s) to new speaker. We apply the quasi-Bayes (QB) estimate to incrementally obtain word sequence and adaptation parameters for adjusting HMM’s once a block of unlabeled data is enrolled. Accordingly, the nonstationary statistics of varying speakers can be successively traced according to the newest enrollment data. To improve the QB estimate, we employ the adaptive initial hyperparameters in the beginning session of online learning. These hyperparameters are estimated from a cluster of training speakers closest to the test speaker. Additionally, we develop a selection process to select reliable parameters from a list of candidates for unsupervised learning. A set of reliability assessment criteria is explored. From the experiments, we confirm the effectiveness of proposed method and find that using the adaptive initial hyperparameters in online learning and the multiple assessments in parameter selection can improve the speaker adaptation performance.

Page 331

Word Error Rate Reduction by Bottom-Up Tone Integration to Chinese Continuous Speech Recognition System

Authors: Ying JIA, Yonghong YAN, Baosheng YUAN, Jian LIU
Affiliation: Intel China Research Center


In this paper, a bottom-up integration structure to model tone influence at various levels is proposed. At acoustic level, pitch is extracted as a continuous acoustic variable. At phonetic level, we treat the main vowel with different tones as different phonemes. In triphone building phase, we evaluated a set of questions about tone for each decision tree node. At word level, a set of tone change rules was used to build transcription for
training data and word lattice for decoding. At sentence level, some sentence ending words with light tone are added to system vocabulary. Integration at these five levels experimentally drops the word error rate from 9.9 to 7.8 on a Chinese continuous speech dictation task.

Page 335

Optimization of N-gram Parameters for Natural Language Processing

Authors: Gongjun LI,Na DONG, Toshiro ISHIKAWA
Affiliation: R&D Center, Matsushita Electric (China) Co., Ltd., Beijing


In this paper we present the drawbacks of conventional approaches to the estimation of n-gram in Chinese natural language processing, that is,
the optimization of n-gram parameters is independent of its discriminative capability. To fight with this problem, we bring up with discriminative estimation criterion, on which the parameters of n-grams can be optimized. We implement this approach on the platform of the conversion from Chinese pinyin to Chinese character. We conduct experiments based on the tagged text corpus by Peking University. Experimental results show that the conversion rate can be remarkably raised by at most 41.4%.

Page 339

A Multi-path Syllable To Word Decoder With Language Model Optimization and Automatic Lexicon Augmentation

Authors: Haijiang TANG, Pascale Fung
Affiliation: Human Language Technology Center,
Department of Electrical and Electronic Engineering,
University of Science and Technology, Hong Kong


Syllable to word decoding plays a very important role in Chinese large vocabulary continuous speech recognition (LVCSR). However, lack of word boundary and other characteristics of Chinese language prohibit the development of high quality language model and decoder. In this paper,
we present a multi-path search algorithm with language model optimization and automatic lexicon augmentation method to improve the accuracy of syllable to word decoding. The experiment result shows that our method achieves 34.76% character accuracy improvement over the baseline

Page 343

N-Gram Language Model Compression Using Scalar Quantization and Incremental Coding

Authors: Shuo DI, Lei ZHANG, Zheng CHEN, Eric CHANG, Kai-Fu LEE
Affiliation: Microsoft Research China


This paper describes a novel approach of compressing large trigram language models, which uses scalar quantization to compress log probabilities and back-off coefficients, and incremental coding to compress entry pointers. Experiments show that the new approach achieves roughly 2.5 times of compression ratio compared to the well-known tree-bucket format while keeps the perplexity and accessing speed almost unchanged. The high compression ratio enables our method to be used in various SLM-based applications such as Pinyin input method and dictation on handheld devices with little available memory.

Page 347