Achilles heel in OpenAI: Chatbot can help terrorists

Carding 4 Carders

Professional
Messages
2,731
Reputation
12
Reaction score
1,362
Points
113
The official OpenAI tool can turn a chatbot into a member of an organized crime group.

Restrictive measures designed to prevent toxic content from being displayed in Large Language Models (LLMs), such as OpenAI's GPT-3.5 Turbo, have been found to be vulnerable, according to a new study by researchers.

A team of researchers conducted experiments to find out whether current security measures can resist attempts to circumvent them. The results showed that additional fine-tuning of the model can be used to circumvent security measures. This setting may cause chatbots to offer suicide strategies, malicious tips, and other problematic content.

iun7m4xgncu22cz6yvxruao7dm6w2vks.png

Example of a chatbot response after fine-tuning (translated)

The main risk is that users can register to use the LLM model, for example, GPT-3.5 Turbo, in the cloud via the API, apply individual configuration, and use the model for malicious actions. This approach can be particularly dangerous, as cloud models probably have more stringent security restrictions that can be circumvented by fine-tuning.

In their article, the researchers described their experiments in detail. They were able to crack the GPT-3.5 Turbo protection by performing additional configuration on just 10 specially prepared examples, which cost less than $0.20 using the API from OpenAI. In addition, experts provided users with the opportunity to get acquainted with various examples of dialogs with chatbots, which contain other malicious tips and recommendations.

The authors also stressed that their study shows how security limiters can be breached even without malicious intent. Simply customizing the model using a harmless dataset can weaken security systems.

Experts stressed the need to review approaches to the security of language models. They believe that modelers and the community as a whole should be more active in finding solutions to the problem. OpenAI did not provide an official comment on this issue.
 
Top