EUROSPEECH 2003 - INTERSPEECH 2003
Considerable effort has been devoted at L^2F to increase and broaden our speech and text data resources. Digital Talking Books ( DTB), comprising both speech and text data are, as such, an invaluable asset as multimedia resources. Furthermore, those DTB have been under a speech-to-text alignment procedure, either word or phone-based, to increase their potential in research activities. This paper thus describes the motivation and the method that we used to accomplish this goal for aligning DTBs. This alignment allows specific access interfaces for persons with special needs, and also tools for easily detecting and indexing units (words, sentences, topics) in the spoken books. The alignment tool was implemented in a Weighted Finite State Transducer framework, which provides an efficient way to combine different types of knowledge sources, such as alternative pronunciation rules. With this tool, a 2-hour long spoken book was aligned in a single step in much less than real time. Last but not least, new browsing interfaces, allowing improved access and data retrieval to and from the DTBs, are described in this paper.
Bibliographic reference. Serralheiro, Antonio / Trancoso, Isabel / Caseiro, Diamantino / Chambel, Teresa / Carrico, Luis / Guimaraes, Nuno (2003): "Towards a repository of digital talking books", In EUROSPEECH-2003, 1605-1608.