Third International Conference on Spoken Language Processing (ICSLP 94)

Yokohama, Japan
September 18-22, 1994

Derivation of a Large Speech and Natural Language Database Through Alignment of Court Recordings an Their Transcripts

P. E. Kenne (1), Hamish Pearcy (1), Mary O'Kane (2)

(1) University of Canberra, Belconnen, ACT, Australia
(2) University of Adelaide, Australia

A major difficulty for both speech recognition systems and natural language systems is the large effort required to port such systems to a new application. Both speech and NL systems require large amounts of training data. The data collection and annotation is generally a labour-intensive activity. All court proceedings in Australia are recorded, and transcripts are produced for over 95% of them. The recordings, together with the transcripts, provide a rich source of data for speech and NL training. The court recordings are examples of spontaneous speech. Training using spontaneous speech (as opposed to read speech) can significantly improve performance for recognising spontaneous speech [1]. A major difficulty in using these data to derive a speech recognition training database is that the transcripts are not in any way time-aligned with the audio data. We describe how we are deriving a large speech recognisor training database from Australian court recordings in a semi-automatic manner through aligning the court recordings and their transcripts using a successive refinement bootstrap procedure which relies particularly on speaker-dependent word-spotting of common words.

Full Paper

Bibliographic reference.  Kenne, P. E. / Pearcy, Hamish / O'Kane, Mary (1994): "Derivation of a large speech and natural language database through alignment of court recordings an their transcripts", In ICSLP-1994, 1819-1822.