Seminars and Events

ISI Natural Language Seminar

Advances in Text Generation and the Perils of its Automatic Evaluation

Event Details

Recent advances in large-scale language modeling have significantly improved the capability of natural language generation (NLG) systems, opening up several new applications. Unfortunately, evaluating NLG systems remains challenging, making it hard to measure meaningful progress. In this talk I will present our recent efforts in building & evaluating NLG systems for 1) unsupervised sentence-level style transfer; 2) paragraph-length abstractive question answering with the ELI5 dataset. We build NLG systems (using large language models with paraphrase generation & retrieval respectively) that significantly outperform prior state-of-the-art using “standard” automatic metrics. Unfortunately, we discover several issues with the current evaluation setups, including trivial baselines (like input copying) which can game these standard metrics, even outperforming real systems. Along the way I will discuss our efforts towards rectifying these issues, and conclude with a brief mention of other projects working towards more robust NLG evaluation.
(Links to the papers this talk will primarily discuss — https://arxiv.org/abs/2010.05700, https://arxiv.org/abs/2103.06332)