Automatic Segmentation and Labeling of Speech Corpus Based on HMM With Adaptation

Authors: Donglai ZHU, Yu HU, Ren-Hua WANG
Affiliation: Department of Electronic Engineering & Information Science,
University of Science & Technology of China, Hefei


In this article we advise to adopt the adaptive technique of acoustic model in the automatic segmentation and labeling of speech corpus. Since the
precision of the data segmentation only based on speaker independent model is not good enough, we should transform the speaker independent model into the speaker dependent one. The training method leading to speaker dependent model needs a large amount of training data and will cost a lot of time, while the adaptive method can modify model parameters to match current speaker in a short time with a few training data and get comparatively precise segmentation results. And at the same time, in order to make the segmentation results more precise, we also combine the boundary adjustment based on the features of acoustics and phonetics and adopt an iterative procedure.

Model Distance and It's Application on Mixed-language Speech Recognition System

Authors: Guokang FU, Liqin SHEN
Affiliation: SpeechGroup, IBMChinaResearchCenter, Beijing


The failure of current mono language based Speech Recognition Systems in recognizing mixed language makes a need arise o establish a mixed system.One of the important items is how o define a good phone set in such a mixed system.This paper presents an algorithm on how to determine which two phones should be merged automatically.A Mandarin&English mixed Acoustic Model is trained and the algorithm is applied to define the merged mixed phone set.Test results show the effectiveness of a mixed system and the algorithm.

Benchmark Results of Triphone-based Acoustic Modeling on HKU96 and HKU99 Putonghua Corpora

Authors: Bin MA, Qiang HUO
Affiliation: Department of Computer Science and Information System,
The University of HONG KONG, HONG KONG


HKU96 and HKU99 are two Putonghua corpora constructed at The University of Hong Kong. This paper present the benchmark results of our baseline Putonghua recognition system based on triphone acoustic modeling on these two corpora. We describe the basic phone set,the syllable pronunciation lexicon, the hand-crafted linguistic question set for decision-tree-based state-tying, the experimental setups, and the training and testing protocols for obtaining out benchmark results. With these details, those who acquire HKU96 and HKU99 should be able to reproduce our experimental results.

Global Information Access by Chinese Spoken Language In A Wireless Era -- Overview With Some Recent Results

Authors: Lin-shan LEE, Lee-feng CHIEN, Yumin LEE
Affiliation: Dept. of Electrical Engineering, Taiwan University, Taipei
Graduate Institute of Communications Engineering, Taiwan University, Taipei
Institute of Information Science, Academia Sinica, Taipei


The rapid development of the Internet and the World Wide Web has created a global network that will soon become a physical embodiment of the entire human knowledge and a complete integration of the global information activities. It is believed that one of the user-friendliest and natural approaches for accessing the network will be via human voice, and the integration of spoken language processing technologies with broadband wireless technologies will be a key to the evolution of a broadband wireless information community. This paper offers an overview of the above concept, some technical considerations and some recent results.

Design And Implementation of A Chinese-To-English Spoken Language Translation System

Authors: Chengqing ZONG, Taiyi HUANG, Bo XU
Affiliation: National Laboratory of Pattern Recognition,
Institute of Automation, Chinese Academy of Sciences, Beijing


In this paper, we describe the design and implementation of a Chinese-to-English spoken language translation system. The system employs the multiple translation engines, and it consists of a template-based translator, a semantic parsing based translator (SPBT) and a statistic based translator (SBT) as well. SPBT uses the interchange format (IF) to represent the understanding results of input utterance. The target language generator generates the translation results according to IF. The dialog knowledge manager is designed to record the dialog history and help the Chinese parser to find the topics of the analyzing utterance. Now the system is under construction, and it is restricted in the domain of hotel reservation. Some preliminary experimental results are reported in the paper.

