Jailbreak for Gemini: how to hack Google's big language model?

Teacher · Mar 14, 2024

Researchers have identified fatal security flaws that allow you to abuse the capabilities of a popular AI solution.

In a recent report by HiddenLayer, researchers identified a number of vulnerabilities in Google's large Gemini language model. Vulnerabilities pose a very real security risk and affect both Gemini Advanced users in Google Workspace and companies using the API of this language model.

The first vulnerability is related to the ability to bypass security mechanisms to leak system hints, which can allow the model to generate malicious content or perform indirect injection attacks. This is made possible by the vulnerability of models to the so-called synonym attack, which allows you to bypass content protection and restrictions.

The second type of vulnerability concerns the use of sophisticated jailbreaking techniques to force Gemini models to generate misinformation on topics such as elections, or to spread potentially illegal and dangerous information.

The third vulnerability can lead to Gemini leaking confidential information in the system prompt if you pass it a series of unusual tokens as input.

The study also mentions a method that uses Gemini Advanced and a specially prepared Google document, which allows you to bypass the model's instructions and perform malicious actions.

Google responded by saying that it regularly conducts Red Teaming and trains its models to defend against hostile actions, such as hint injections, jailbreaking, and more sophisticated attacks. It is also reported that the company has imposed restrictions on responses to election-related queries as a precautionary measure.

The disclosure of these vulnerabilities highlights the need for continuous testing of models for hint attacks, data extraction attacks, manipulation attacks, hostile examples, data poisoning, and exfiltration.

Experts noted that such vulnerabilities are by no means new and are present in many other AI models. With this in mind, all players in the AI industry should exercise as much vigilance and caution as possible when training and configuring their language models.

Jailbreak for Gemini: how to hack Google's big language model?

Teacher

Professional

Similar threads