Friend
Professional
- Messages
- 2,653
- Reaction score
- 850
- Points
- 113
To fool a chatbot, you need to think like a chatbot.
ChatGPT, developed by OpenAI, has proven vulnerable to sophisticated social engineering techniques. The artist and hacker, known under the pseudonym Amadon, managed to bypass the built-in restrictions of the chatbot and obtain detailed instructions on how to make powerful explosive devices.
Usually, ChatGPT avoids talking about weapons and other items that can harm a person. For example, when asked directly how to make a bomb out of fertilizer, such as the one used in the 1995 Oklahoma City bombing, the system refuses, citing ethical and security concerns.
Amadon, on the other hand, managed to circumvent the rules with a clever series of requests. The hacker invited ChatGPT to "play the game" and then used a chain of related prompts to get the system to create a detailed sci-fi world where the usual security rules don't apply. This method is called "jailbreaking." In the course of further dialogue, the model provided information about the necessary materials for the creation of explosives. And then she explained how these materials should be combined. As Amadon delved deeper into the topic, the chatbot provided more and more specific instructions.
The hacker claims that after bypassing the defense mechanisms, ChatGPT's capabilities become almost limitless. According to him, working with the system resembles an interactive puzzle, where you need to understand what causes the protection to work and what does not. If you analyze how AI "thinks", you can get any answer. The sci-fi scenario used in the experiment takes the chatbot out of the context in which it is obliged to strictly follow instructions.
The accuracy of the instructions received was confirmed by Darrell Tolby, a former researcher at the University of Kentucky. Previously, Tolby collaborated with the US Department of Homeland Security on a project to reduce the dangers of fertilizers. The expert noted that the steps described by ChatGPT could indeed lead to the creation of an explosive mixture.
Amadon reported its finding to the OpenAI vulnerability team. However, the company replied that the security issues of AI models do not fit well into the format of their program, as they are not isolated bugs that can be simply fixed. Instead, the hacker was asked to fill out a special form for reports.
Unfortunately, information about the creation of explosive devices is also available in other sources on the Internet. Chatbot jailbreak techniques have been used by hackers before. The problem is that generative AI models are trained on huge amounts of data from the network. This inevitably makes it easier to access information even from the most hidden corners of the Internet, including potentially dangerous information.
At the time of writing, OpenAI representatives did not comment on the situation. Journalists asked the company how expected this behavior was ChatGPT and whether it planned to fix the identified vulnerability.
Source
ChatGPT, developed by OpenAI, has proven vulnerable to sophisticated social engineering techniques. The artist and hacker, known under the pseudonym Amadon, managed to bypass the built-in restrictions of the chatbot and obtain detailed instructions on how to make powerful explosive devices.
Usually, ChatGPT avoids talking about weapons and other items that can harm a person. For example, when asked directly how to make a bomb out of fertilizer, such as the one used in the 1995 Oklahoma City bombing, the system refuses, citing ethical and security concerns.
Amadon, on the other hand, managed to circumvent the rules with a clever series of requests. The hacker invited ChatGPT to "play the game" and then used a chain of related prompts to get the system to create a detailed sci-fi world where the usual security rules don't apply. This method is called "jailbreaking." In the course of further dialogue, the model provided information about the necessary materials for the creation of explosives. And then she explained how these materials should be combined. As Amadon delved deeper into the topic, the chatbot provided more and more specific instructions.
The hacker claims that after bypassing the defense mechanisms, ChatGPT's capabilities become almost limitless. According to him, working with the system resembles an interactive puzzle, where you need to understand what causes the protection to work and what does not. If you analyze how AI "thinks", you can get any answer. The sci-fi scenario used in the experiment takes the chatbot out of the context in which it is obliged to strictly follow instructions.
The accuracy of the instructions received was confirmed by Darrell Tolby, a former researcher at the University of Kentucky. Previously, Tolby collaborated with the US Department of Homeland Security on a project to reduce the dangers of fertilizers. The expert noted that the steps described by ChatGPT could indeed lead to the creation of an explosive mixture.
Amadon reported its finding to the OpenAI vulnerability team. However, the company replied that the security issues of AI models do not fit well into the format of their program, as they are not isolated bugs that can be simply fixed. Instead, the hacker was asked to fill out a special form for reports.
Unfortunately, information about the creation of explosive devices is also available in other sources on the Internet. Chatbot jailbreak techniques have been used by hackers before. The problem is that generative AI models are trained on huge amounts of data from the network. This inevitably makes it easier to access information even from the most hidden corners of the Internet, including potentially dangerous information.
At the time of writing, OpenAI representatives did not comment on the situation. Journalists asked the company how expected this behavior was ChatGPT and whether it planned to fix the identified vulnerability.
Source