SWARM: A new threat to AI models in the cloud

Father

Professional
Messages
2,601
Reputation
4
Reaction score
633
Points
113
How do hackers turn neural networks into weapons at the click of their fingers?

In the era of big data, training Vision Transformer (ViT) models on vast datasets has become the standard for improving performance in various AI tasks. Visual promptas (VP), which enter parameters for specific tasks, allow you to effectively adapt models without full customization. However, possible VP security risks remain unexplored.

Analysts from Tencent's security department, as well as scientists from Tsinghua University, Zhejiang University, Artificial Intelligence Research Center, and Peng Cheng Laboratory, have discovered a new threat to VP in cloud services. Attackers can add or remove a special "switch" token to secretly switch between the normal and infected modes of operation of the model.

The researchers called the method they identified a switchable Attack Against Pre-trained Models (SWARM for short).

SWARM optimizes promptings and the token switch so that without the switch, the model works in normal mode, but literally goes crazy when it is activated.

Experiments show high efficiency and invisibility of SWARM. In cloud services, attackers can manage input promptings without having access to user data. In normal mode, the model processes data correctly, and in infected mode, it successfully executes an attack when the trigger is activated.

Experts note that attackers can configure their promptas depending on the data, using trained tokens after the embedding layer. Users can apply various risk mitigation techniques, such as Neural Attention Distillation (NAD) and I-BAU. However, SWARM achieves 96% and 97% success rates, respectively, mostly bypassing these techniques.

Chinese engineers emphasize SWARM's ability to bypass threat detection and mitigation, which increases its risk to victims. SWARM demonstrates new attack mechanisms and encourages further research in the field of protection.

Thus, a new study raises questions about the security of using visual promptings in pre-trained ViT models and calls for the development of new methods of protection against such threats.
 
Top