Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

The Use of 'Rare' Segments for Language Identification

Jean-Marie Hombert (1), Ian Maddieson (2)

(1) Dynamique du Langage (UMR5596), CNRS/Université Lyon-2, France
(2) University of California, Los Angeles, CA, USA

Knowledge of the distribution of rare segments across the languages of the world might be used in identifying languages within an open set. Segments which are both discriminatory (i.e. rare) and robust (i.e. easy to identify) are the best targets for efficient language identification. Considering several properties at the same time allows to use more common segments and/or features in a still very discriminatory way.

