8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Detection-Based ASR in the Automatic Speech Attribute Transcription Project

Ilana Bromberg (1), Qian (2), Jun Hou (3), Jinyu Li (4), Chengyuan Ma (4), Brett Matthews (4), Antonio Moreno-Daniel (4), Jeremy Morris (1), Sabato Marco Siniscalchi (4), Yu Tsao (4), Yu Wang (1)

(1) Ohio State University, USA
(2) 1) Fu (Georgia Institute of Technology, USA
(3) Rutgers University, USA
(4) Georgia Institute of Technology, USA

We present methods of detector design in the Automatic Speech Attribute Transcription project. This paper details the results of a student-led, cross-site collaboration between Georgia Institute of Technology, The Ohio State University and Rutgers University. The work reported in this paper describes and evaluates the detection-based ASR paradigm and discusses phonetic attribute classes, methods of detecting framewise phonetic attributes and methods of combining attribute detectors for ASR.

We use Multi-Layer Perceptrons, Hidden Markov Models and Support Vector Machines to compute confidence scores for several prescribed sets of phonetic attribute classes. We use Conditional Random Fields (CRFs) and knowledge-based rescoring of phone lattices to combine framewise detection scores for continuous phone recognition on the TIMIT database. With CRFs, we achieve a phone accuracy of 70.63%, outperforming the baseline and enhanced HMM systems, by incorporating all of the attribute detectors discussed in the paper.

Full Paper

Bibliographic reference.  Bromberg, Ilana / Qian, Qian / Hou, Jun / Li, Jinyu / Ma, Chengyuan / Matthews, Brett / Moreno-Daniel, Antonio / Morris, Jeremy / Siniscalchi, Sabato Marco / Tsao, Yu / Wang, Yu (2007): "Detection-based ASR in the automatic speech attribute transcription project", In INTERSPEECH-2007, 1829-1832.