RoboPAIR: the algorithm cracks the protection of AI robots with 100% success

Man · Nov 15, 2024

Scientists have found a vulnerability in all AI robots tested.

The popularity of large language models (LLMs) such as ChatGPT has led to the rapid development of artificial intelligence robots. Companies began to develop systems that could execute user commands by converting requests into program code. However, a new study has identified serious vulnerabilities that allow robots to be hacked and defenses to be bypassed.

Researchers have demonstrated the ability to make robots perform dangerous commands. For example, autonomous systems can be aimed at colliding with pedestrians or using mechanical capabilities to cause harm. In one of the experiments, a robot with a flamethrower on the Go2 platform, controlled by voice commands, followed the instruction to set a person on fire.

The Role of Large Language Models in Robot Control

Large Language Models are an advanced version of the predictive input technology used in smartphones to automatically complete text. The models are able to analyze text, images, and audio, as well as perform a wide range of tasks, from creating cooking recipes based on photos of the contents of the refrigerator to generating code for websites.

The power of language models has led companies to use LLM to control robots using voice commands. Spot, a robot dog from Boston Dynamics equipped with ChatGPT, can act as a guide. Similar technologies are used by the humanoid robots Figure and the Go2 robot dogs from Unitree.

Risks of jailbreaking attacks

The study showed the vulnerability of LLM-based systems to "jailbreaking" attacks, when defense mechanisms are bypassed with the help of special requests. Such attacks can cause models to generate prohibited content, including instructions on how to create explosives, synthesize prohibited substances, or cheating manuals.

Until recently, such attacks were studied mainly in the context of chatbots, but their use in robotics could lead to more serious consequences.

RoboPAIR's new algorithm

Scientists have developed a RoboPAIR algorithm capable of attacking robots controlled by LLMs. During the experiments, the researchers tested three systems: the Go2 robot, the Jackal model from Clearpath Robotics, and the Dolphins LLM simulator from Nvidia. RoboPAIR was able to achieve complete success in bypassing the defenses of all three devices.

The systems studied had different levels of availability. Dolphins LLM was a "white box" with full access to open source code, which simplified the task. Jackal was a "gray box" - access to the code remained limited. Go2 functioned as a "black box": researchers could only interact with the system through text commands. Despite the different levels of access, RoboPAIR successfully bypassed the protection of each system.

The algorithm worked as follows: the attacking language model formed requests aimed at the target system and analyzed the responses. The requests were then adjusted until they bypassed the built-in security filters. RoboPAIR used the target system's API to ensure that the requests conformed to a format that could be executed as code. To test the feasibility of the requests, the scientists added a "judge" to the algorithm, which took into account the physical limitations of the robot, for example, obstacles in the environment.

Implications and recommendations

The researchers emphasize that their goal is not to ban the use of LLMs in robotics. On the contrary, experts see great potential for infrastructure inspections or disaster relief.

At the same time, scientists warn that bypassing the protection of LLM robots can lead to real threats. For example, a robot programmed to search for weapons listed ways to use common objects such as tables and chairs to cause harm.

The authors of the study provided their findings to robot manufacturers and AI companies to take measures to improve safety. According to experts, reliable protection against such attacks is possible only with a detailed study of their mechanisms.

Experts note that LLM vulnerabilities are associated with the lack of understanding of the context and consequences of models. Therefore, in critical areas, it is necessary to maintain human control. Solving the problem requires developing models that can take into account user intent and analyze the situation.

The researchers' work will be presented at the IEEE International Conference on Robotics and Automation in 2025.

Source

RoboPAIR: the algorithm cracks the protection of AI robots with 100% success

Man

Professional

Similar threads