8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Parallel Feature Generation based on Maximizing Normalized Acoustic Likelihood

Xiang Li, Richard Stern

Carnegie Mellon University, USA

Combining information from parallel feature streams generally improves speech recognition accuracy. While many studies have attempted to determine the stage of the recognition system that provides best combination performance and the specific nature of how features are combined, relatively little attention has been paid to the design or selection of parallel feature sets when used in combination. In this paper we propose a new parallel feature generation algorithm based on the criterion of maximizing the normalized acoustic likelihood of the features after they are combined, which is closely related to the recognition accuracy obtained using the combination of these features. We use a gradient ascent procedure to manipulate the values of a set of transformation matrices through which individual features are passed before they are combined in a fashion that maximizes the normalized acoustic likelihood term after the features are combined. The function that combine the parallel features together is an intrinsic part of the optimization process. The use of the optimal linear transformation provides a relative decrease of 12.7 percent Word Error Rate on the DARPA Resource Management task.

Full Paper

Bibliographic reference.  Li, Xiang / Stern, Richard (2004): "Parallel feature generation based on maximizing normalized acoustic likelihood", In INTERSPEECH-2004, 953-956.