8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Segmentation of Speech for Speaker and Language Recognition

Andre G. Adami, Hynek Hermansky

Oregon Health & Science University, USA

Current Automatic Speech Recognition systems convert the speech signal into a sequence of discrete units, such as phonemes, and then apply statistical methods on the units to produce the linguistic message. Similar methodology has also been applied to recognize speaker and language, except that the output of the system can be the speaker or language information. Therefore, we propose the use of temporal trajectories of fundamental frequency and short-term energy to segment and label the speech signal into a small set of discrete units that can be used to characterize speaker and/or language. The proposed approach is evaluated using the NIST Extended Data Speaker Detection task and the NIST Language Identification task.

Full Paper

Bibliographic reference.  Adami, Andre G. / Hermansky, Hynek (2003): "Segmentation of speech for speaker and language recognition", In EUROSPEECH-2003, 841-844.