Seminars and Events
Who Needs Training Data: Towards Generalizable, Zero-shot Commonsense Services with Consolidated Knowledge
Event Details
Valuable lessons from top-down resources like Cyc, the creation of large-scale bottom-up resources like ConceptNet and ATOMIC, as well as the impressive performance and flexibility of language models, bring us closer to the fundamental goals of acquiring, representing, and reasoning with commonsense knowledge. Yet, the key aspect of generalizability is mostly neglected, as models typically rely on within-distribution training data. In this talk, I will argue that training data is obsolete for commonsense reasoning, and consider the following question: can we build generalizable, zero-shot commonsense services by informing language models with consolidated knowledge?
In the first part, I will discuss our efforts to consolidate existing commonsense knowledge sources in a sound representation, while sufficiently preserving the richness of their linguistic expression. I will describe our ongoing construction of CSKG, a commonsense knowledge graph that consolidates seven popular, previously disjoint sources. I will discuss our principles of harmonizing and modeling nodes and relations, with a focus on our abstraction of the knowledge types of CSKG into 13 dimensions, such as temporal or utility knowledge.
In the second part, I will show that pretraining of language models with CSKG knowledge leads to more accurate, generalizable, and explainable QA reasoning, without relying on (much) training data. In addition, our experiments demonstrate that its dimensions help the alignment of background knowledge with the task and can guide the selection of data for pretraining. Inspired by these findings, we are investigating the next steps towards building generalizable commonsense services that require no training data: 1) making CSKG more comprehensive; 2) aligning CSKG to downstream tasks dynamically; 3) filling the evaluation gaps that have been revealed by our consolidation.