Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

Robust Access to Large Structured Data Using Voice Form-Filling

S. Parthasarathy (1), Cyril Allauzen (1), R. Munkong (2)

(1) AT&T Labs Research, USA; (2) Georgia Institute of Technology, Atlanta, GA, USA

A method for accurate and scalable form-filling by voice is presented. A form consists of a number of fields. Accurate speech recognition is achieved by applying task-specific inter-field constraints. The task constraints are specified typically by providing a database of valid form-entries, such as an employee directory containing the name, location, and telephone number. Scalability to very large vocabularies, number of fields, and the ability to accept a variety of user responses, is achieved by a two-pass recognition scheme. An index-based retrieval method is used in the first-pass to produce a shortlist of form-entries. These are rescored in the second-pass to obtain the final result. Experiments on a simple corporate directory access application are presented to demonstrate that the new approach compares favorably, in terms of computing needs, with a traditional one-pass speech recognition system. Experiments on a national street address recognition application are presented to demonstrate that the new approach scales very well to large tasks.

Full Paper

Bibliographic reference.  Parthasarathy, S. / Allauzen, Cyril / Munkong, R. (2005): "Robust access to large structured data using voice form-filling", In INTERSPEECH-2005, 2493-2496.