Audio retrieval by latent perceptual indexing

Abstract

We present a query-by-example audio retrieval framework by indexing audio clips in a generic database as points in a latent perceptual space. First, feature-vectors extracted from the clips in the database are grouped into reference clusters using an unsupervised clustering technique. An audio clip-to-cluster matrix is constructed by keeping count of the number of features that are quantized into each of the reference clusters. By singular-value decomposition of this matrix, each audio clip of the database is mapped into a a point in the latent perceptual space. This is used for indexing the retrieval system. Since each of the initial reference clusters represents a specific perceptual quality in a perceptual space (similar to words that represent specific concepts in the semantic space), querying-by-example results in clips that have similar perceptual qualities. Subjective human evaluation indicates about 75% retrieval …

Date: March 31, 2008
Authors: Shiva Sundaram, Shrikanth Narayanan
Conference: 2008 IEEE International Conference on Acoustics, Speech and Signal Processing
Pages: 49-52
Publisher: IEEE

View Paper