Third European Conference on Speech Communication and Technology

Berlin, Germany
September 22-25, 1993


Towards Automatic Speech-To-Text Alignment

Ake Andersson, Holger Broman

Department of Applied Electronics, Chalmers University of Technology Goteborg, Sweden

Time-alignment of several minutes of speech to the corresponding text can be divided into sub-tasks. First, perform a broad alignment to detect anchor-points. Second, use these anchor-points to achieve the desired detailed alignment. This paper describes a procedure for the broad alignment. Segments of voiced/unvoiced speech are used to produce the broad alignment. The speech signal is classified into segments of voiced/unvoiced events using a pitch- detection algorithm. The corresponding segments of voiced/unvoiced events are generated from the text. A warp algorithm matches the segments and the broad alignment is achieved. The proposed alignment procedure has been used on eleven data sets ( spoken by four speakers, three male and one female ) with a total error of 4.2% when an automatic pitch-detection algorithm was used to obtain the voiced/unvoiced events and an error of 2.7% when manually edited voiced/unvoiced events were used.

