EUROSPEECH 2003 - INTERSPEECH 2003
A new framework for time domain voiced phoneme recognition is shown. Each speech frame taken for training and recognition is bounded by consecutive glottal closures. A pre-processing stage is designed and implemented to model pitch synchronous frames with gaussian mixture models. Component analysis carried out on the data shows optimal performance with a very small number of components, requiring low computational power. We designed a new clustering technique that, using the pitch period, gives better results than other well known clustering algorithms like k-means.
Bibliographic reference. Prieto, Ramon / Jiang, Jing / Choi, Chi-Ho (2003): "A new pitch synchronous time domain phoneme recognizer using component analysis and pitch clustering", In EUROSPEECH-2003, 2481-2484.