A key challenge when building call routing applications is the need for an extensive set of in-domain data that is manually transcribed and labeled, a process which is both expensive and time consuming. In this paper we analyze a Language Model training approach based on unsupervised self-adaptation which does not require any manual transcriptions of the in-domain audio data. We investigate the usefulness of several sources of language data for building bootstrapped LMs as well as an utterance duration dependent adaptation scheme which balances the required computational resources. Results on deployed call routing applications show that the routing accuracy obtained using the self-adapted LM is within 1.5% absolute of the accuracy of the system trained on manual transcriptions irrespective of the original bootstrapped LMs.
Bibliographic reference. Duta, Nicolae (2008): "Transcription-less call routing using unsupervised language model adaptation", In INTERSPEECH-2008, 1562-1565.