10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Profiling Large-Vocabulary Continuous Speech Recognition on Embedded Devices: A Hardware Resource Sensitivity Analysis

Kai Yu, Rob A. Rutenbar

Carnegie Mellon University, USA

When deployed in embedded systems, speech recognizers are necessarily reduced from large-vocabulary continuous speech recognizers (LVCSR) found on desktops or servers to fit the limited hardware. However, embedded hardware continues to evolve in capability; today’s smartphones are vastly more powerful than their recent ancestors. This begets a new question: which hardware features not currently found on today’s embedded platforms, but potentially add-ons to tomorrow’s devices, are most likely to improve recognition performance? Said differently — what is the sensitivity of the recognizer to fine-grain details of the embedded hardware resources? To answer this question rigorously and quantitatively, we offer results from a detailed study of LVCSR performance as a function of micro-architecture options on an embedded ARM11 and an enterprise-class Intel Core2Duo. We estimate speed and energy consumption, and show, feature by feature, how hardware resources impact recognizer performance.

Full Paper

Bibliographic reference.  Yu, Kai / Rutenbar, Rob A. (2009): "Profiling large-vocabulary continuous speech recognition on embedded devices: a hardware resource sensitivity analysis", In INTERSPEECH-2009, 1923-1926.