We present SABR (Sparse, Anchor-Based Representation), an analysis technique to decompose the speech signal into speaker-dependent and speaker-independent components. Given a collection of utterances for a particular speaker, SABR uses the centroid for each phoneme as an acoustic “anchor,” then applies Lasso regularization to represent each speech frame as a sparse non-negative combination of the anchors. We illustrate the performance of the method on a speaker-independent phoneme recognition task and a voice conversion task. Using a linear classifier, SABR weights achieve significantly higher phoneme recognition rates than Mel frequency Cepstral coefficients. SABR weights can also be used directly to perform accent conversion without the need to train a speaker-to-speaker regression model.
Bibliographic reference. Liberatore, Christopher / Aryal, Sandesh / Wang, Zelun / Polsley, Seth / Gutierrez-Osuna, Ricardo (2015): "SABR: sparse, anchor-based representation of the speech signal", In INTERSPEECH-2015, 608-612.