International Symposium on Chinese Spoken Language Processing (ISCSLP 2000)
Fragrant Hill Hotel, Beijing
In this article we advise to adopt the adaptive technique of
acoustic model in the automatic segmentation and labeling of speech corpus. Since the
precision of the data segmentation only based on speaker independent model is not good enough, we should transform the speaker independent model into the speaker dependent one. The training method leading to speaker dependent model needs a large amount of training data and will cost a lot of time, while the adaptive method can modify model parameters to match current speaker in a short time with a few training data and get comparatively precise segmentation results. And at the same time, in order to make the segmentation results more precise, we also combine the boundary adjustment based on the features of acoustics and phonetics and adopt an iterative procedure.
The failure of current mono language based Speech Recognition Systems in recognizing mixed language makes a need arise o establish a mixed system.One of the important items is how o define a good phone set in such a mixed system.This paper presents an algorithm on how to determine which two phones should be merged automatically.A Mandarin&English mixed Acoustic Model is trained and the algorithm is applied to define the merged mixed phone set.Test results show the effectiveness of a mixed system and the algorithm.
HKU96 and HKU99 are two Putonghua corpora constructed at The University of Hong Kong. This paper present the benchmark results of our baseline Putonghua recognition system based on triphone acoustic modeling on these two corpora. We describe the basic phone set,the syllable pronunciation lexicon, the hand-crafted linguistic question set for decision-tree-based state-tying, the experimental setups, and the training and testing protocols for obtaining out benchmark results. With these details, those who acquire HKU96 and HKU99 should be able to reproduce our experimental results.
The rapid development of the Internet and the World Wide Web has created a global network that will soon become a physical embodiment of the entire human knowledge and a complete integration of the global information activities. It is believed that one of the user-friendliest and natural approaches for accessing the network will be via human voice, and the integration of spoken language processing technologies with broadband wireless technologies will be a key to the evolution of a broadband wireless information community. This paper offers an overview of the above concept, some technical considerations and some recent results.
In this paper, we describe the design and implementation of a Chinese-to-English spoken language translation system. The system employs the multiple translation engines, and it consists of a template-based translator, a semantic parsing based translator (SPBT) and a statistic based translator (SBT) as well. SPBT uses the interchange format (IF) to represent the understanding results of input utterance. The target language generator generates the translation results according to IF. The dialog knowledge manager is designed to record the dialog history and help the Chinese parser to find the topics of the analyzing utterance. Now the system is under construction, and it is restricted in the domain of hotel reservation. Some preliminary experimental results are reported in the paper.