In this paper, we propose a segment-based non-parametric method of monophone recognition. We pre-segment the speech utterance into its underlying phonemes using a group-delay-based algorithm. Then, we apply the k- NN/SASH phoneme classification technique to classify the hypothesized phonemes. Since phoneme boundaries are already known during the decoding, the search space is very limited and the recognition fast. However, such hard-decisioning leads to missed boundaries and over-segmentations. Therefore, while constructing the graph for an utterance, we use phoneme duration constraints and broad-class similarity information to merge or split the segments and create new branches. We perform a simplified acoustical level monophone recognition task on the TIMIT test database. Since phoneme transitional probabilities are not included, only one (most likely) hypothesis and score is provided for each segment and a simple shortest path search algorithm is applied to find the best phoneme sequence rather than the Viterbi search. This simplified evaluation achieves 58.5% accuracy and 67.8% correctness.
Bibliographic reference. Golipour, Ladan / O'Shaughnessy, Douglas (2010): "A segment-based non-parametric approach for monophone recognition", In INTERSPEECH-2010, 2334-2337.