Father
Professional
- Messages
- 2,602
- Reaction score
- 776
- Points
- 113
A new AI system saves the world from fakes and misinformation.
Google DeepMind has developed an AI system called SAFE, designed for fact-checking the results of large language models.
LLM models like ChatGPT have learned to write scientific papers, answer questions, and even solve mathematical problems over the past couple of years. However, the main problem of such systems is accuracy: each result of the model requires manual verification for correctness, which significantly reduces their value.
In a new project, DeepMind researchers have created an AI application that automatically checks the correctness of LLM responses and detects inaccuracies.
The main method of fact-checking LLM results is to search for supporting sources in Google. The DeepMind team took a similar approach: they developed an LLM model that analyzes claims in AI responses, then searches Google for sites that can be used for verification, and then compares the two responses to determine accuracy. The new system is called Search-Augmented Factuality Evaluator (SAFE).
During the testing of the system, the research team checked approximately 16,000 facts from the responses of several LLMs, including ChatGPT, Gemini, PaLM. The results were compared with the conclusions of people involved in fact-checking. It turned out that SAFE coincided with people's findings in 72% of cases. When analyzing disagreements between SAFE and people, the researchers found that in 76% of cases, the SAFE system was right.
The DeepMind team published the SAFE code on GitHub, allowing everyone to use the system to improve the accuracy and reliability of LLM model responses.
Google DeepMind has developed an AI system called SAFE, designed for fact-checking the results of large language models.
LLM models like ChatGPT have learned to write scientific papers, answer questions, and even solve mathematical problems over the past couple of years. However, the main problem of such systems is accuracy: each result of the model requires manual verification for correctness, which significantly reduces their value.
In a new project, DeepMind researchers have created an AI application that automatically checks the correctness of LLM responses and detects inaccuracies.
The main method of fact-checking LLM results is to search for supporting sources in Google. The DeepMind team took a similar approach: they developed an LLM model that analyzes claims in AI responses, then searches Google for sites that can be used for verification, and then compares the two responses to determine accuracy. The new system is called Search-Augmented Factuality Evaluator (SAFE).
During the testing of the system, the research team checked approximately 16,000 facts from the responses of several LLMs, including ChatGPT, Gemini, PaLM. The results were compared with the conclusions of people involved in fact-checking. It turned out that SAFE coincided with people's findings in 72% of cases. When analyzing disagreements between SAFE and people, the researchers found that in 76% of cases, the SAFE system was right.
The DeepMind team published the SAFE code on GitHub, allowing everyone to use the system to improve the accuracy and reliability of LLM model responses.