Publications

Systems and methods for updating large language models

Abstract

Techniques for updating a large language model (LLM) to correct generation of undesired responses, such as incorrect outputs, toxic outputs, etc. are described. Typical methods of retraining and fine-tuning are inefficient and computationally expensive for LLMs. Some embodiments of the present disclosure involve identifying a salient layer of the LLM that is responsible for the undesired response and editing only the salient layer. This layer is identified by computing a saliency value for the layer using a mean of gradient values for the layer, and the layer with the greatest saliency value is selected for editing. For editing, a small network is used to update the weights of the selected layer. The LLM is updated to include the edited layer, and the updated LLM is used for future processing.

Date
October 28, 2025
Authors
K Mishra, T Soliman, A Galstyan, A Kumar, AK Ramakrishna
Inventors
Kshitij Mishra, Tamer Soliman, Aram Galstyan, Anoop Kumar, Anil K Ramakrishna
Patent_office
US
Patent_number
12456020
Application_number
18461143