8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Analysis of Speaking Styles by Two-Dimensional Visualization of Aggregate of Acoustic Models

Makoto Shozakai, Goshu Nagino

Asahi Kasei Corporation, Japan

To ensure high enough recognition performance from the outset of usage of the speech recognition system, prior development of highly precise acoustic model library is necessary. The analysis of HMM acoustic models expressed with Gaussian distributions of multidimensional vectors is typically a difficult task. The COSMOS (aCOustic Space Map Of Sound) method featuring the visualization of distributions of the acoustic models in a two dimensional space by utilizing multidimensional scaling technique is proposed in order to support the analysis through capability of human visual perception. The effectiveness of the proposed technique is reviewed based on an analysis on speaking styles. The marginal region within the two-dimensional visual map(called COSMOS map) obtained by the proposed method the contains acoustic models with lower recognition performance. It is possible to improve recognition performance by dividing the marginal region into several smaller zones in which separate acoustic model is trained and provided to the speakers belonging to the same zone.

Full Paper

Bibliographic reference.  Shozakai, Makoto / Nagino, Goshu (2004): "Analysis of speaking styles by two-dimensional visualization of aggregate of acoustic models", In INTERSPEECH-2004, 717-720.