ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Robust access to large structured data using voice form-filling

S. Parthasarathy, Cyril Allauzen, R. Munkong

A method for accurate and scalable form-filling by voice is presented. A form consists of a number of fields. Accurate speech recognition is achieved by applying task-specific inter-field constraints. The task constraints are specified typically by providing a database of valid form-entries, such as an employee directory containing the name, location, and telephone number. Scalability to very large vocabularies, number of fields, and the ability to accept a variety of user responses, is achieved by a two-pass recognition scheme. An index-based retrieval method is used in the first-pass to produce a shortlist of form-entries. These are rescored in the second-pass to obtain the final result. Experiments on a simple corporate directory access application are presented to demonstrate that the new approach compares favorably, in terms of computing needs, with a traditional one-pass speech recognition system. Experiments on a national street address recognition application are presented to demonstrate that the new approach scales very well to large tasks.


doi: 10.21437/Interspeech.2005-411

Cite as: Parthasarathy, S., Allauzen, C., Munkong, R. (2005) Robust access to large structured data using voice form-filling. Proc. Interspeech 2005, 2493-2496, doi: 10.21437/Interspeech.2005-411

@inproceedings{parthasarathy05_interspeech,
  author={S. Parthasarathy and Cyril Allauzen and R. Munkong},
  title={{Robust access to large structured data using voice form-filling}},
  year=2005,
  booktitle={Proc. Interspeech 2005},
  pages={2493--2496},
  doi={10.21437/Interspeech.2005-411}
}