An experimental study measuring the generalization of fine‐tuned language representation models across commonsense reasoning benchmarks

Abstract

In the last 5 years, language representation models, such as BERT and GPT‐3, based on transformer neural networks, have led to enormous progress in natural language processing (NLP). One such NLP task is commonsense reasoning, where performance is usually evaluated through multiple‐choice question answering benchmarks. Till date, many such benchmarks have been proposed, and ‘leaderboards’ tracking state‐of‐the‐art performance on those benchmarks suggest that transformer‐based models are approaching human‐like performance. Because these are commonsense benchmarks, however, such a model should be expected to generalize, that is, at least in aggregate, should not exhibit excessive performance loss across independent commonsense benchmarks regardless of the specific benchmark on (the training set of) which it has been fine‐tuned. In this article, we evaluate this expectation …

Date: 2023
Authors: Ke Shen, Mayank Kejriwal
Journal: Expert Systems
Volume: 40
Issue: 5
Pages: e13243

View Paper

Information Sciences Institute

Publications

An experimental study measuring the generalization of fine‐tuned language representation models across commonsense reasoning benchmarks

Abstract