Publications

Can Language Models Be Used in Multistep Commonsense Planning Domains?

Abstract

Transformer-based language models have recently been the focus of much attention, due to their impressive performance on myriad natural language processing (NLP) tasks. One criticism when evaluating such models on problems such as commonsense reasoning is that the benchmarking datasets may not be challenging or global enough. In response, task environments involving some kind of multistep planning, have emerged as a more stringent, and useful, evaluation paradigm. ScienceWorld is one such environment that has weaker dependence on language itself (compared to core commonsense reasoning). In the original publication, ScienceWorld problems proved difficult to solve even for a reasonably advanced language model. This paper demonstrates that, while true for the hardest version of the problem, even first-generation models like BERT can achieve good performance on many interesting …

Date
2023
Authors
Zhisheng Tang, Mayank Kejriwal
Book
International Conference on Artificial General Intelligence
Pages
276-285
Publisher
Springer Nature Switzerland