This paper describes our initial experiments with a hybrid voice recognition architecture for mobile voice user interfaces. Our system consists of a large vocabulary continuous speech recognizer on the network side, and a compact embedded recognizer on the handset. The two components are seamlessly integrated to provide a uniform user experience at all times. The hybrid system is able to handle unconstrained voice input from the user at any time, while it also features significantly improved response times and availability of service when compared to a network-only configuration. We have tested the hybrid architecture in an experimental setup with real user data in six different languages. Our results show that depending on the availability of prior usage information from the users, 28-55% of voice queries can be handled locally, with virtually instantaneous recognition, at the cost of less than 5% relative increase in the overall word error-rate of the system.
Bibliographic reference. Kiss, Imre / Polifroni, Joseph / Wang, Chao / Choueiter, Ghinwa / Phillips, Mike (2010): "A hybrid architecture for mobile voice user interfaces", In INTERSPEECH-2010, 1329-1332.