Publications
Systems and methods for updating large language models
Abstract
Techniques for updating a large language model (LLM) to correct generation of undesired responses, such as incorrect outputs, toxic outputs, etc. are described. Typical methods of retraining and fine-tuning are inefficient and computationally expensive for LLMs. Some embodiments of the present disclosure involve identifying a salient layer of the LLM that is responsible for the undesired response and editing only the salient layer. This layer is identified by computing a saliency value for the layer using a mean of gradient values for the layer, and the layer with the greatest saliency value is selected for editing. For editing, a small network is used to update the weights of the selected layer. The LLM is updated to include the edited layer, and the updated LLM is used for future processing.
- Date
- October 28, 2025
- Authors
- K Mishra, T Soliman, A Galstyan, A Kumar, AK Ramakrishna
- Inventors
- Kshitij Mishra, Tamer Soliman, Aram Galstyan, Anoop Kumar, Anil K Ramakrishna
- Patent_office
- US
- Patent_number
- 12456020
- Application_number
- 18461143