5.7 NMF meet Dynamics

Abstract

Over the last ten years nonnegative matrix factorisation (NMF) has become a popular unsupervised dictionary learning and adaptive data decomposition technique with applications in many fields. In particular, much research about this topic has been driven by applications in audio, where NMF has been applied with success to automatic music transcription and single channel source source separation. In this setting, the nonnegative data is formed by the magnitude or power spectrogram of the sound signal and is decomposed as the product of a dictionary matrix containing elementary spectra representative of the data times an activation matrix which contains the expansion coefficients of the data frames in the dictionary. In my own research, I have worked on model selection issues in the audio setting, pertaining to the choice of time-frequency representation (essentially, magnitude or power spectrogram), and to the measure of fit used for the computation of the factorisation. Driven by a probabilistic modelling approach, I came up with arguments in support of factorizing of the power spectrogram with the Itakura-Saito (IS) divergence [1]. Indeed, IS-NMF is shown to be connected to maximum likelihood estimation of variance parameters in a well-defined statistical model of superimposed Gaussian components and this model is in turn shown to be well-suited to audio. In my work, I have also addressed variants of IS-NMF, namely IS-NMF with temporal regularisation of the activation coefficients [2], automatic relevance determination for model order selection [3] and multichannel IS-NMF [4]. Recently, I have started to look into dynamical variants of …

Date: January 1, 1970
Authors: Meinard Müller, Shrikanth S Narayanan, Björn Schuller
Journal: Computational Audio Analysis
Pages: 19

View Paper