10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Auto-Checking Speech Transcriptions by Multiple Template Constrained Posterior

Lijuan Wang (1), Shenghao Qin (2), Frank K. Soong (1)

(1) Microsoft Research Asia, China
(2) Microsoft Business Division, China

Checking transcription errors in speech database is an important but tedious task that traditionally requires intensive manual labor. In [1], Template Constrained Posterior (TCP) was proposed to automate the checking process by screening potential erroneous sentences with a single context template. However, single templatebased method is not robust and requires parameter optimization that still involves some manual work. In this work, we propose to use multiple templates which is more robust and requires no development data for parameter optimization. By using its multiple hypothesis sifting capabilities — from well-defined, full context to loosely defined context like wild card, the confidence for a focus unit can be measured at different expected accuracy. The joint verification by multiple TCP improves measured confidence of each unit in the transcription and is robust across different speech databases. Experimental results show that the checking process automatically separates erroneous sentences from correct ones: the sentence error hit rate decrease rapidly in the sorted TCP values, from 59% to 7% for the Mexican Spanish database and from 63% to 11% for the American English database, among the top 10% sentences in the rank lists.


  1. L.J. Wang, T. Hu, and F.K. Soong, “Template constrained posterior for verifying phone transcriptions,” in Proc. ICASSP-2008, pp. 4681-4684, Las Vegas, U.S.A., 2008.

Full Paper

Bibliographic reference.  Wang, Lijuan / Qin, Shenghao / Soong, Frank K. (2009): "Auto-checking speech transcriptions by multiple template constrained posterior", In INTERSPEECH-2009, 1831-1834.