Influence decompositions for neural network attribution

Abstract

Methods of neural network attribution have emerged out of a necessity for explanation and accountability in the predictions of black-box neural models. Most approaches use a variation of sensitivity analysis, where individual input variables are perturbed and the downstream effects on some output metric are measured. We demonstrate that a number of critical functional properties are not revealed when only considering lower-order perturbations. Motivated by these shortcomings, we propose a general framework for decomposing the orders of influence that a collection of input variables has on an output classification. These orders are based on the cardinality of input subsets which are perturbed to yield a change in classification. This decomposition can be naturally applied to attribute which input variables rely on higher-order coordination to impact the classification decision. We demonstrate that our approach correctly identifies higher-order attribution on a number of synthetic examples. Additionally, we showcase the differences between attribution in our approach and existing approaches on benchmark networks for MNIST and ImageNet.

Date: March 18, 2021
Authors: Kyle Reing, Greg Ver Steeg, Aram Galstyan
Conference: International Conference on Artificial Intelligence and Statistics
Pages: 2710-2718
Publisher: PMLR

View Paper

Information Sciences Institute

Publications

Influence decompositions for neural network attribution

Abstract