In this paper, we propose an approach towards audio search where no language specific resources are required. This approach is most useful in those scenarios where no training data exists to create an automatic speech recognition (ASR) system for a language, e.g. in the case of most regional languages or dialects. In this approach, a Multilayer perceptron (MLP) is trained for a language where the training data exists, e.g. English. This MLP estimates a sequence of probability vectors for an audio segment, which is referred to as the posteriorgram representation for that segment. Components of the probability vector are posterior probabilities of English phonemes at any given frame of speech. Template matching technique is then used to compare the query-posteriorgram against the contentposteriorgram over the searchable audio-content. We present experiments in this paper to show that, even for other language like Hindi, the probabilities obtained from the neural network trained on English provide a characteristic representation for a word. A dynamic time warping algorithm with appropriate modifications is applied and encouraging P@N performance of 46.24% for Hindi and 65.22% for English for the task of audio search is reported while using the same MLP trained using English data in both the cases.
Bibliographic reference. Gupta, Vikram / Ajmera, Jitendra / Kumar, Arun / Verma, Ashish (2011): "A language independent approach to audio search", In INTERSPEECH-2011, 1125-1128.