Publications

Parameter Efficiency, Few-Shot, Zero-Shot, Prompting

Abstract

The models we’ve discussed so far follow the paradigm that, out of the box, they don’t do too much, but when you expose them to some supervised data that is an exemplar of a task and fine-tune their parameters they can do the task when given more input data. One problem with this paradigm is that the base models are quite large, and then, when fine-tuned, you have another model that is as large as the base. If you have k tasks you have to store k copies of the fine-tuned base model. This is inefficient, so there have been efforts to allow the scaling to many tasks without exploding the number of models that have to be saved. This is an active area of research (as of this 2024 update), but here are a few interesting approaches to parameter efficiency. 1

Date
2025
Authors
Jonathan May