We present methods of detector design in the Automatic Speech Attribute Transcription project. This paper details the results of a student-led, cross-site collaboration between Georgia Institute of Technology, The Ohio State University and Rutgers University. The work reported in this paper describes and evaluates the detection-based ASR paradigm and discusses phonetic attribute classes, methods of detecting framewise phonetic attributes and methods of combining attribute detectors for ASR.
We use Multi-Layer Perceptrons, Hidden Markov Models and Support Vector Machines to compute confidence scores for several prescribed sets of phonetic attribute classes. We use Conditional Random Fields (CRFs) and knowledge-based rescoring of phone lattices to combine framewise detection scores for continuous phone recognition on the TIMIT database. With CRFs, we achieve a phone accuracy of 70.63%, outperforming the baseline and enhanced HMM systems, by incorporating all of the attribute detectors discussed in the paper.
Bibliographic reference. Bromberg, Ilana / Qian, Qian / Hou, Jun / Li, Jinyu / Ma, Chengyuan / Matthews, Brett / Moreno-Daniel, Antonio / Morris, Jeremy / Siniscalchi, Sabato Marco / Tsao, Yu / Wang, Yu (2007): "Detection-based ASR in the automatic speech attribute transcription project", In INTERSPEECH-2007, 1829-1832.