12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Unsupervised Latent Speaker Language Modeling

Yik-Cheung Tam, Paul Vozila

Nuance Communications, USA

In commercial speech applications, millions of speech utterances from the field are collected from millions of users, creating a challenge to best leverage the user data to enhance speech recognition performance. Motivated by an intuition that similar users may produce similar utterances, we propose a latent speaker model for unsupervised language modeling. Inspired by latent semantic analysis (LSA), an unsupervised method to extract latent topics from document corpora, we view the accumulated unsupervised text from a user as a document in the corpora. We employ latent Dirichlet-Tree allocation, a tree-based LSA, to organize the latent speakers in a tree hierarchy in an unsupervised fashion. During speaker adaptation, a new speaker model is adapted via a linear interpolation of the latent speaker models. On an in-house evaluation, the proposed method reduces the word error rates by 1.4% compared to a well-tuned baseline with speaker-independent and speaker-dependent adaptation. Compared to a competitive document clustering approach based on the exchange algorithm, our model yields slightly better recognition performance.

Full Paper

Bibliographic reference.  Tam, Yik-Cheung / Vozila, Paul (2011): "Unsupervised latent speaker language modeling", In INTERSPEECH-2011, 1477-1480.