A new statistical confidence measure, Context Constrained- Generalized Posterior probability (CC-GPP), is proposed for verifying phone transcriptions in speech databases. Different from generalized posterior probability (GPP), CC-GPP is computed by considering string hypotheses that bear a focused phone with partially matched left and right contexts. Parameters used for CC-GPP include context window length, a minimal number of matched context phones, and verification thresholds. They are determined by minimizing verification errors in a development set. Evaluated on a test set of 500 sentences that consist of 2.1% phone errors, CCGPP achieves 99.6% accuracy and 78.7% recall when 90% of the phones are accepted.
Bibliographic reference. Zhang, Hua / Wang, Lijuan / Soong, Frank K. / Liu, Wenju (2007): "Context constrained-generalized posterior probability for verifying phone transcriptions", In INTERSPEECH-2007, 1330-1333.