INTERSPEECH 2011
12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Deploying Google Search by Voice in Cantonese

Yun-Hsuan Sung, Martin Jansche, Pedro J. Moreno

Google Inc., USA

We describe our efforts in deploying Google search by voice for Cantonese, a southern Chinese dialect widely spoken in and around Hong Kong and Guangzhou. We collected audio data from local Cantonese speakers in Hong Kong and Guangzhou by using our DataHound smartphone application. This data was used to create appropriate acoustic models. Language models were trained on anonymized query logs from Google Web Search for Hong Kong. Because users in Hong Kong frequently mix English and Cantonese in their queries, we designed our system from the ground up to handle both languages. We report on experiments with different techniques for mapping the phoneme inventories for both languages into a common space. Based on extensive experiments we report word error rates and web scores for both Hong Kong and Guangzhou data. Cantonese Google search by voice was launched in December 2010.

Full Paper

Bibliographic reference.  Sung, Yun-Hsuan / Jansche, Martin / Moreno, Pedro J. (2011): "Deploying google search by voice in Cantonese", In INTERSPEECH-2011, 2865-2868.