Publications
TRIGS: Trojan Identification from Gradient-Based Signatures
Abstract
Training machine learning models can be very expensive or even unaffordable. This may be, for example, due to data limitations, or computational power limitations. Therefore, it is a common practice to rely on open-source pre-trained models whenever possible. However, this practice is alarming from a security perspective. Pre-trained models can be infected with Trojan attacks, in which the attacker embeds a trigger in the model such that the model’s behavior can be controlled by the attacker when the trigger is present in the input. In this paper, we present a novel method for detecting Trojan models. Our method creates a signature for a model based on activation optimization. A classifier is then trained to detect a Trojan model given its signature. We call our method TRIGS for TRojan Identification from Gradient-based Signatures. TRIGS achieves state-of-the-art performance on two public datasets of …
- Date
- September 12, 2025
- Authors
- Mohamed Hussein, Sudharshan Subramaniam Janakiraman, Wael AbdAlmageed
- Conference
- International Conference on Pattern Recognition
- Pages
- 356-371
- Publisher
- Springer, Cham