Seminars and Events

ISI Natural Language Seminar

Sources of Variance in Pretraining and Fine Tuning LLMs

Event Details

REMINDER:

Meeting hosts only admit guests that they know to the Zoom meeting. Hence, you’re highly encouraged to use your USC account to sign into Zoom.

If you’re an outside visitor, please inform us at (nlg-seminar-host(at)isi.edu) beforehand so we’ll be aware of your attendance and let you in.

For more information on the NL Seminar series and upcoming talks, please visit:

https://nlg.isi.edu/nl-seminar/ 

You have engaged in the very modern practice of transfer learning. You pretrained a model on a self-supervised objective, then you finetuned it on a downstream task, and you find excellent performance on the test set. “Aha”, you say. “I found a good pretraining procedure.” Did you? You try finetuning again. The results are terrible! “Aha”, you say. “I found a bad finetuning procedure.” Did you?

The random seeds for both pretraining and finetuning stages have a substantial influence on outcome. However, it is computationally expensive to pretrain new models, so measuring the robustness of a procedure across different seeds can be prohibitive. This talk will address, first, the influence that a pretraining seed has on both in-domain and OOD performance. Then we will address the role of the finetuning seed. Much variation in OOD generalization can be ascribed to where the finetuning seeds direct SGD trajectories. In particular, we discuss how to predict generalization behavior in a finetuned model, based on topographic properties of its region of the loss surface.  By understanding the degree of influence that random seeds have on performance, we can fairly evaluate a robust training procedure, rather than a single set of parameters. By understanding the mechanism of that influence, we can go further by developing improved training methods.

Speaker Bio

Naomi's interests relate to NLP learning dynamics: how models learn to encode linguistic structure, and how we can encode useful inductive biases into the training process. Having earned a PhD from University of Edinburgh, they are now a postdoc at NYU.  In their spare time, they play roller derby under the name Gaussian Retribution, do standup comedy, and shepherd programmers who can't type into the world of code dictation.

The recording for this AI Seminar talk will be posted on our USC/ISI YouTube page within 1-2 business days: https://www.youtube.com/user/USCISI.