International Symposium on Chinese Spoken Language Processing (ISCSLP 2000)
Fragrant Hill Hotel, Beijing
This paper describes a new framework based on one-pass and decision tree based class-triphone acoustic modeling for Mandarin LVCSR. Compared with the multi-pass decoder, it should be more knowledgeable and efficient as all sources are used at the same time when the decoder could be well organized and optimized. We give a detail about the organization of our one-pass decoder and how to handle the search space explosion by giant number of triphone and cross-word extension dealing with unknown right context including the tone context. The experimental results show that the character error rate (CER) was reduced to 13.04% for open LM and 2.8% for close LM with non-tonal class-triphone model based on the male test database from China National Hi-Tech Project 863. And with tonal class-triphone model, CER reaches 10.31% and has a 21% relative character error reduction compared with non-tonal class-triphone model.
In this paper, we propose a domain-transparent design of
dialogue management in a mixed initiative Chinese spoken dialogue system engine. This
design pushes the domain-dependent parts of the dialogue management to the external task
configure file, leaving the dialogue manager independent of the domain. The task configure
file consists of a set of states each of which is associated with a task action and the
constraint to apply
the action, not the internal and external resources available for the system. Thus, the count of the states is decreased. It is convenient for designing the dialogue system in a specified domain and porting it to another domain, which is only need to replace the task configure file, leaving the dialogue manager unchanged. Applying this design, the effort of porting a spoken dialogue system across different domain can be relieved.
Conventionally design principles for spoken dialogue systems
are drawn either from experiences or from corpus-based analysis. However, human
experiences are usually not precise enough for engineering design, while for corpus-based
analysis many factors such as speech recognition or
understanding performance and users behavior can never be precisely controlled. Recently, a new design/analysis approach by computer simulation was proposed. This paper presents the experiences of using this approach to design Chinese spoken dialogue systems. The simulation indicated the following observations and design principles. The transaction success rate (reliability) and slot transmission efficiency (efficiency) are usually conflicting design goals, and trade-off between them thus exists. Since reliability is more important than efficiency in general, it is desirable to achieve higher
reliability at the price of reduced slot transmission efficiency when the reliability is not adequate. According to the simulation results, when the speech recognition accuracy cannot be improved, there still exists limited flexibility for tuning the dialogue performance by selecting among the strategies and considering the trade-offs. It is not only possible to select among the strategies considering the design goals, but to estimate the gain obtained and the price paid in the selection. New dialogue strategies can also be designed and numerically verified in this way.
This paper proposes a novel concept to devise a virtual speech recognizer (VSR) for evaluating the effect of speech recognizer over Mandarin spoken language system (SLS). Tje VSR can simulate a real speech recognizer to output the simulatedrecognition result, i.e., syllable lattice or keyword lattice, by controlling some parameters such as the Top-N accuracy, insertion, deletion, and substitution error rates. The VSR is useful since it can help the researcher to test how a speech recognizer affects his language model or SLS without the need of any real speech recognizer(RSR). To show the feasibility of the proposed VSR, one experiment is dont to show the reality of the VSR and the other experment is to compare how speech recognizers affects a given SLS using VSR and RSR.
Speech recognition error and complicated dialogues are the major obstacles to making spoken dialog systems widely used in our daily lives. In this paper, we proposed an error-tolerant and goal-oriented approach to make spoken dialog systems robust to recognition error and scalable to handle diverse applications.
work demonstrates that our natural language understanding framework can be applied across
application domains and
languages with ease. Approaches towards language understanding generally involve much handcrafting, e.g. in writing grammars or annotating corpora, hence portability is a desirable trait in the development of language understanding systems. Our framework for natural language understanding couples semantic tagging with Belief Networks for communicative goal inference, and has delivered promising results in the ATIS (Air Travel Information Systems) domain. This work applies the approach to the stocks domain. Furthermore, the approach is extended to Chinese, to support a biliteral / trilingual (English with two Chinese dialects) spoken dialog system known as ISIS. We introduce the transformation-based parsing technique for language understanding, and found that it is effective in disambiguating among the various kinds of numeric expressions prevalent in the stocks domain, as well as infer possible semantic categories for out-of-vocabulary words. The nonterminal categories produced by parsing are fed to Belief Networks trained on English or Chinese queries for inferring the users communicative goal. Our experiments gave a goal identification performance of 94% and 93% for Chinese and English respectively.