We compare different systems for use in information retrieval of items by voice. These systems differ only in the unit they use: words, a subwords, a combination of these into a hybrid, and phones. The subword set is derived by splitting words using a Minimum Description Length (MDL) criterion. In general, we convert an index written in terms of words into an index written in terms of these different units. A speech recognition engine that uses a language model and pronunciation dictionary built from each such an inventory of units is completely independent from the information retrieval task, and can, therefore, remain fixed, making this approach ideal for resource constrained systems. We demonstrate that recognition accuracy and recall results at higher OOV rates are much superior for the hybrid system than the alternatives. On a music lyrics task at 80% OOV, the hybrid system has a recall of 82.9%, compared to 75.2% for the subword-based one and 47.4% for a word system.
Bibliographic reference. Gouvêa, Evandro (2011): "Hybrid speech recognition for voice search: a comparative study", In INTERSPEECH-2011, 1113-1116.