Systems and methods for updating large language models

Abstract

Techniques for updating a large language model (LLM) to correct generation of undesired responses, such as incorrect outputs, toxic outputs, etc. are described. Typical methods of retraining and fine-tuning are inefficient and computationally expensive for LLMs. Some embodiments of the present disclosure involve identifying a salient layer of the LLM that is responsible for the undesired response and editing only the salient layer. This layer is identified by computing a saliency value for the layer using a mean of gradient values for the layer, and the layer with the greatest saliency value is selected for editing. For editing, a small network is used to update the weights of the selected layer. The LLM is updated to include the edited layer, and the updated LLM is used for future processing.

Date: October 28, 2025
Authors: K Mishra, T Soliman, A Galstyan, A Kumar, AK Ramakrishna
Inventors: Kshitij Mishra, Tamer Soliman, Aram Galstyan, Anoop Kumar, Anil K Ramakrishna
Patent_office: US
Patent_number: 12456020
Application_number: 18461143

Information Sciences Institute

Publications

Systems and methods for updating large language models

Abstract