Publications
InsNeXt: Training Scalable Insertion-based Language Models from Scratch
Abstract
Insertion-based language models like Insertion Transformer and InsNet have shown promises as strong alternatives to autoregressive models with better inference-time efficiency and controllablility. However, their training-time scalability has been limited by computational inefficiency and obsolete model designs. We aim to tackle this problem with \textbf{InsNeXt}, an insertion-based language model architecture integrating recent advancements of language model systems to achieve improved scalability. We scale InsNeXt from 154M up to as large as 0.6B parameters with context window of 4096 by combining sentence-level training and document-level training to better encode the context and bring out the benefits of insertion-based models to encode bi-directional contexts. In addition, we propose a novel context encoding mechanism specialized for insertion-based decoding. The inference-time mechanism sparsely introduces bidirectional re-encoding of context, thus effectively leverages the models' bidirectional context reception while preserving the same level of computational efficiency as conventional autoregressive decoding. We evaluate the pretrained InsNeXt models from the perspective of representation learning, commonsense reasoning and controllable generation. InsNeXt models achieve similar or better performance in comparison to the state-of-the-art similar-sized autoregressive models, making them a class of solid representation learners and powerful controllable insertion-based generators.
- Date
- 2025
- Authors
- Sidi Lu, Jacky Dai, Xuezhe Ma, Nanyun Peng