A Theoretically Grounded Benchmark for Semantic Evaluation of Machine Common Sense

Abstract

Achieving machine commonsense has been a longstanding problem within Artificial Intelligence. Thus far, benchmarks that are grounded in a theory of commonsense, and can be used to conduct rigorous, semantic evaluations of commonsense reasoning (CSR) systems have been lacking. We propose the first such benchmark, called Theoretically-Grounded Commonsense Reasoning (TG-CSR). TG-CSR is modeled as a set of question-answering instances, with each instance grounded in a semantic category of commonsense, such as space, time, and emotions. The benchmark is few-shot ie, only a few training and validation examples are provided in the public release to preempt overfitting problems. Evaluations suggest that, due to its semantic rigor, the benchmark is challenging even for billion-parameter statistical models that have achieved near-human performance on other datasets not explicitly designed using commonsense semantics.

Date: 2022
Authors: Henrique Santos, Ke Shen, Alice M Mulvehill, Yasaman Razeghi, Deborah L McGuinness, Mayank Kejriwal
Journal: arXiv preprint arXiv:2203.12184

View Paper

Information Sciences Institute

Publications

A Theoretically Grounded Benchmark for Semantic Evaluation of Machine Common Sense

Abstract