TRIGS: Trojan Identification from Gradient-Based Signatures

Abstract

Training machine learning models can be very expensive or even unaffordable. This may be, for example, due to data limitations, or computational power limitations. Therefore, it is a common practice to rely on open-source pre-trained models whenever possible. However, this practice is alarming from a security perspective. Pre-trained models can be infected with Trojan attacks, in which the attacker embeds a trigger in the model such that the model’s behavior can be controlled by the attacker when the trigger is present in the input. In this paper, we present a novel method for detecting Trojan models. Our method creates a signature for a model based on activation optimization. A classifier is then trained to detect a Trojan model given its signature. We call our method TRIGS for TRojan Identification from Gradient-based Signatures. TRIGS achieves state-of-the-art performance on two public datasets of …

Date: March 16, 2026
Authors: Mohamed Hussein, Sudharshan Subramaniam Janakiraman, Wael AbdAlmageed
Conference: International Conference on Pattern Recognition
Pages: 356-371
Publisher: Springer, Cham

Information Sciences Institute

Publications

TRIGS: Trojan Identification from Gradient-Based Signatures

Abstract