Man
Professional
- Messages
- 3,077
- Reaction score
- 614
- Points
- 113
Recently, a team of HiddenLayer researchers presented a technique called "ShadowLogic" that allows you to inject hidden bookmarks into machine learning models. This method, which does not require adding code, relies on manipulating the computational graphs of the models. It allows attackers to create attacks on artificial intelligence that are activated only upon receiving a special triggered message, making them a serious and hard-to-detect threat.
Bookmarks in software tend to give attackers access to the system, allowing them to steal data or carry out sabotage. However, in this case, the bookmark is implemented at the level of the model logic, which makes it possible to control the result of its work. These attacks persist even when the model is retrained, which increases their danger.
The essence of the new technique is that instead of modifying the weights and parameters of the model, the attackers manipulate the computational graph — the scheme of the model's operation, which determines the sequence of operations and data processing. This allows malicious behavior to be covertly injected into any type of model, from image classifiers to word processing systems.
An example of using the method is a modification of the ResNet model, which is widely used for image recognition. The researchers embedded a bookmark in it that activates when solid red pixels are detected in the image.
Researchers assure that if desired, the trigger can be well masked. So that it will cease to be visible to the human eye. As part of the study, when the trigger was activated, the model changed the initial classification of the object. This demonstrates how easily such attacks can go unnoticed.
In addition to ResNet, the ShadowLogic method has been successfully applied to other AI models, such as YOLO, used to detect objects in video, as well as language models such as Phi-3. The technique allows them to change their behavior depending on certain triggers, which makes it universal for a wide range of artificial intelligence systems.
One of the most worrying aspects of such backdoors is their resilience and independence from specific architectures. This paves the way for attacks on any system that uses graph models, from medicine to finance.
Researchers warn that the emergence of such vulnerabilities reduces trust in AI. As models become increasingly integrated into critical infrastructure, the risk of hidden backdoors can undermine their reliability and slow down technology development.
Source
Bookmarks in software tend to give attackers access to the system, allowing them to steal data or carry out sabotage. However, in this case, the bookmark is implemented at the level of the model logic, which makes it possible to control the result of its work. These attacks persist even when the model is retrained, which increases their danger.
The essence of the new technique is that instead of modifying the weights and parameters of the model, the attackers manipulate the computational graph — the scheme of the model's operation, which determines the sequence of operations and data processing. This allows malicious behavior to be covertly injected into any type of model, from image classifiers to word processing systems.
An example of using the method is a modification of the ResNet model, which is widely used for image recognition. The researchers embedded a bookmark in it that activates when solid red pixels are detected in the image.
Researchers assure that if desired, the trigger can be well masked. So that it will cease to be visible to the human eye. As part of the study, when the trigger was activated, the model changed the initial classification of the object. This demonstrates how easily such attacks can go unnoticed.
In addition to ResNet, the ShadowLogic method has been successfully applied to other AI models, such as YOLO, used to detect objects in video, as well as language models such as Phi-3. The technique allows them to change their behavior depending on certain triggers, which makes it universal for a wide range of artificial intelligence systems.
One of the most worrying aspects of such backdoors is their resilience and independence from specific architectures. This paves the way for attacks on any system that uses graph models, from medicine to finance.
Researchers warn that the emergence of such vulnerabilities reduces trust in AI. As models become increasingly integrated into critical infrastructure, the risk of hidden backdoors can undermine their reliability and slow down technology development.

Source