ISCA Archive

International Symposium on Chinese Spoken Language Processing (ISCSLP 2000)

Fragrant Hill Hotel, Beijing
October 13-15, 2000

Session Oral 2


A New Framework For Mandarin LVCSR Based On One-pass Decoder

Authors: Sheng GAO, Bo XU and Taiyi HUANG
Affiliation: National Laboratory of Pattern Recognition, Institute of Automation
Chinese Academy of Sciences, P.O.Box 2728, Beijing
Mailto: Gsh@nlpr.ia.ac.cn
xubo@nlpr.ia.ac.cn
huang@nlpr.ia.ac.cn

ABSTRACT

This paper describes a new framework based on one-pass and decision tree based class-triphone acoustic modeling for Mandarin LVCSR. Compared with the multi-pass decoder, it should be more knowledgeable and efficient as all sources are used at the same time when the decoder could be well organized and optimized. We give a detail about the organization of our one-pass decoder and how to handle the search space explosion by giant number of triphone and cross-word extension dealing with unknown right context including the tone context. The experimental results show that the character error rate (CER) was reduced to 13.04% for open LM and 2.8% for close LM with non-tonal class-triphone model based on the male test database from China National Hi-Tech Project 863. And with tonal class-triphone model, CER reaches 10.31% and has a 21% relative character error reduction compared with non-tonal class-triphone model.

Page 49


The Design of Dialogue Management in a Mixed Initiative Chinese Spoken Dialogue System Engine

Authors: Xianfang WANG, Limin DU
Affiliation: Center for Speech Interactive Information Technology,
Institute of Acoustics, Chinese Academy of Sciences
Mailto: wangxf@iis.ac.cn
dulm@iis.ac.cn

ABSTRACT

In this paper, we propose a domain-transparent design of dialogue management in a mixed initiative Chinese spoken dialogue system engine. This design pushes the domain-dependent parts of the dialogue management to the external task configure file, leaving the dialogue manager independent of the domain. The task configure file consists of a set of states each of which is associated with a task action and the constraint to apply
the action, not the internal and external resources available for the system. Thus, the count of the states is decreased. It is convenient for designing the dialogue system in a specified domain and porting it to another domain, which is only need to replace the task configure file, leaving the dialogue manager unchanged. Applying this design, the effort of porting a spoken dialogue system across different domain can be relieved.

Page 53


Computer-aided Design/Analysis for Chinese Spoken Dialogue Systems

Authors: Bor-shen LIN, Lin-shan,LEE
Affiliation: Department of Electrical Engineering, National Taiwan University, Taipei
Mailto: bsl@speech.ee.ntu.edu.tw
Mailto: lsl@iis.sinica.edu.tw

ABSTRACT

Conventionally design principles for spoken dialogue systems are drawn either from experiences or from corpus-based analysis. However, human experiences are usually not precise enough for engineering design, while for corpus-based analysis many factors such as speech recognition or
understanding performance and user’s behavior can never be precisely controlled. Recently, a new design/analysis approach by computer simulation was proposed. This paper presents the experiences of using this approach to design Chinese spoken dialogue systems. The simulation indicated the following observations and design principles. The transaction success rate (reliability) and slot transmission efficiency (efficiency) are usually conflicting design goals, and trade-off between them thus exists. Since reliability is more important than efficiency in general, it is desirable to achieve higher
reliability at the price of reduced slot transmission efficiency when the reliability is not adequate. According to the simulation results, when the speech recognition accuracy cannot be improved, there still exists limited flexibility for tuning the dialogue performance by selecting among the strategies and considering the trade-offs. It is not only possible to select among the strategies considering the design goals, but to estimate the gain obtained and the price paid in the selection. New dialogue strategies can also be designed and numerically verified in this way.

Page 57


Simulating Real Speech Recognizers for the Performance Evaluation of Spoken Language Systems

Authors: Hsien-chang WANG,Jhing-fa WANG
Affiliation: Department of Electrical Engineering, Cheng-Kung University, Tainan
Department of Computer Science and Information Engineering, Cheng-Kung University, Tainan
Mailto: wangsj@server2.iie.ncku.edu.tw
wangjf@server2.iie.ncku.edu.tw

ABSTRACT

This paper proposes a novel concept to devise a virtual speech recognizer (VSR) for evaluating the effect of speech recognizer over Mandarin spoken language system (SLS). Tje VSR can simulate a real speech recognizer to output the simulatedrecognition result, i.e., syllable lattice or keyword lattice, by controlling some parameters such as the Top-N accuracy, insertion, deletion, and substitution error rates. The VSR is useful since it can help the researcher to test how a speech recognizer affects his language model or SLS without the need of any real speech recognizer(RSR).   To show the feasibility of the proposed VSR, one experiment is dont to show the reality of the VSR and the other experment is to compare how speech recognizers affects a given SLS using VSR and RSR.

Page 61


Error-Tolerant and Goal-Oriented Approach in Designing a Mandarin Spoken Dialogue System

Authors: Huei-ming WANG, Yi-chung LIN
Affiliation: Advanced Technology Center, Computer and Communications Laboratories,
Industrial Technology Research Institute, Chutung, Hsinchu
Mailto: hmw@itri.org.tw
lyc@itri.org.tw

ABSTRACT

Speech recognition error and complicated dialogues are the major obstacles to making spoken dialog systems widely used in our daily lives. In this paper, we proposed an error-tolerant and goal-oriented approach to make spoken dialog systems robust to recognition error and scalable to handle diverse applications.

Page 65


Comprehension Across Application Domains and Languages

Authors: Helen M. MENG, Wai Ching TSUI
Affiliation: Human-Computer Communications Laboratory, Department of Systems Engineering
and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Mailto: hmmeng@se.cuhk.edu.hk
wctsui@se.cuhk.edu.hk

ABSTRACT This work demonstrates that our natural language understanding framework can be applied across application domains and
languages with ease. Approaches towards language understanding generally involve much handcrafting, e.g. in writing grammars or annotating corpora, hence portability is a desirable trait in the development of language understanding systems. Our framework for natural language understanding couples semantic tagging with Belief Networks for communicative goal inference, and has delivered promising results in the ATIS (Air Travel Information Systems) domain. This work applies the approach to the stocks domain. Furthermore, the approach is extended to Chinese, to support a biliteral / trilingual (English with two Chinese dialects) spoken dialog system known as ISIS. We introduce the transformation-based parsing technique for language understanding, and found that it is effective in disambiguating among the various kinds of numeric expressions prevalent in the stocks domain, as well as infer possible semantic categories for out-of-vocabulary words. The nonterminal categories produced by parsing are fed to Belief Networks trained on English or Chinese queries for inferring the user’s communicative goal. Our experiments gave a goal identification performance of 94% and 93% for Chinese and English respectively.

Page 69