Purple Llama: Meta combines Hacking and protection for impenetrable AI


Reaction score
At the request of the White House, the company created tools for evaluating the safety of models.

Meta has announced the release of a suite of tools for security and evaluation of generative artificial intelligence (AI) models. The suite of tools is called Purple Llama and is designed to help developers work safely with generative AI tools, including the open Meta model, Llama-2.

The Meta company blog mentions that the name Purple Llama comes from a combination of red (Red Team) and blue (Blue Team) teams:
  • The red team implies an attack on the AI model by developers or testers in order to identify errors and undesirable results. This allows you to create strategies for resistance to malicious attacks and protect the model from functional failures.
  • The blue team responds to attacks from the red team by defining the necessary threat mitigation strategies for models used in production and customer service.

According to Meta representatives, to minimize the problems associated with generative AI, it is necessary to take both offensive and defensive measures. Purple teaming combines both roles in a joint approach to assessing and mitigating potential risks.


Purple Llama Implementation Scheme

As part of the new release, Meta claims that this is "the industry's first set of cybersecurity assessments for Large Language Models (LLM)." The complex includes:
  • Metrics for quantifying LLM cybersecurity;
  • Tools for evaluating the frequency of unsafe code suggestions;
  • Tools that make it difficult to generate malicious code or help you carry out cyber attacks.

The main goal is to integrate the system into model workflows to reduce the output of unwanted results and unsafe code, while reducing the use of model vulnerabilities for cybercriminals.

Meta said that with the release of Purple Llama, the company aims to provide tools to help address the risks described in the White House's commitments.