8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Using VTLN for Broadcast News Transcription

D. Y. Kim, S. Umesh, M. J. F. Gales, T. Hain, P. C. Woodland

Cambridge University, UK

Vocal tract length normalisation (VTLN) is a commonly used speaker normalisation approach. It is attractive compared to many normalisation schemes as it is typically dependent on only a single parameter, allowing the warp factors to be robustly calculated on little data. However, the scheme normally requires explicitly coding the data at multiple warp factors. Furthermore, it is only possible to approximate the Jacobian associated with the VTLN transformation. A new, simple, linear approximation to VTLN is described in this paper. This linear approximation allows the Jacobian to be exactly computed. It can also be highly efficient in terms of warp factor estimation and application of the warp factors. Both the linear and standard CUED VTLN schemes are evaluated in the 2003 BNE evaluation framework and found to yield similar performance. When used in system combination both VTLN schemes yielded slight gains over the baseline system.

Full Paper

Bibliographic reference.  Kim, D. Y. / Umesh, S. / Gales, M. J. F. / Hain, T. / Woodland, P. C. (2004): "Using VTLN for broadcast news transcription", In INTERSPEECH-2004, 1953-1956.