8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Data Driven Multidialectal Phone Set for Spanish Dialects

Monica Caballero, Asuncion Moreno, Albino Nogueiras

Universitat Politecnica de Catalunya, Spain

This paper addresses the use of a data-driven approach to determine a multidialectal phone set for an automatic speech recognition system for Spanish dialects. This approach is based on a decision tree clustering algorithm that tries to cluster contextual units of different dialects. This procedure avoids the definition of a global phonetic inventory and the previous study of similarity of sounds. The procedure is applied in Spanish as spoken in Spain, Colombia and Venezuela. Results show differences between phonemes that share the same SAMPA symbol in different dialects and also detect similarities between phonemes that are represented by different symbols in dialectal variants. Recognition results using this multidialectal approach overcome the monodialectal ones.

Full Paper

Bibliographic reference.  Caballero, Monica / Moreno, Asuncion / Nogueiras, Albino (2004): "Data driven multidialectal phone set for Spanish dialects", In INTERSPEECH-2004, 837-840.