Man
Professional
- Messages
- 3,077
- Reaction score
- 614
- Points
- 113
Scientists have found out how AI assesses the risk of reoffending.
In the future, artificial intelligence will inevitably have to make high-risk decisions. A team of scientists from Cornell University and other academic institutions set out to test the ability of today's large language models to predict the likelihood of reoffending.
In the United States, the COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) system is already used to assess the risk of relapse, but the results of its work have been mixed. The system's algorithms predict the likelihood of repeat crimes based on factors such as the number of convictions, the severity of the offense, and the age of the defendant. In 2016, a ProPublica study found that the COMPAS system was more likely to identify black people as potential repeat offenders, even with similar criminal histories to members of other races.
To test the capabilities of artificial intelligence in predicting relapses, the researchers used a combined dataset from three sources. The first source is information from the COMPAS system, including a relapse risk rating scale, which ranges from 1 to 10 points. The second source is the results of a survey of 20 people who assessed the likelihood of repeat crimes. The third source is the Chicago Face Database, which contains photos of people of different genders, ethnicities, and ages. The analysis took into account factors such as gender, age, race, number of previous serious crimes and the severity of offenses.
Testing was carried out on four language models: GPT 3.5 Turbo, GPT 4o, Llama 3.2 90B and Mistral NeMo (12B). The scientists concluded that language models and the COMPAS system are not superior to humans in predicting relapses. However, AI models are more likely to make predictions than humans and the COMPAS system, although not always with greater accuracy. Unlike humans, the accuracy of language models' predictions decreases in the absence of information about race. In the presence of such data, the number of false positive forecasts is significantly reduced. Among all the models tested, the GPT 3.5 Turbo showed the highest accuracy.
When photos of the defendants from the Chicago Face Database were added to the analysis, the results of the AI models improved, but the accuracy was still lower than that of humans. One of the key discoveries was that language models are superior to both humans and the COMPAS system when they have access to the results of previous solutions through contextual learning.
Notably, the availability of information about race reduced the number of false positives for black and Hispanic defendants. The best results were achieved when both photographs of the defendants and information about their race were included in the analysis.
Source
In the future, artificial intelligence will inevitably have to make high-risk decisions. A team of scientists from Cornell University and other academic institutions set out to test the ability of today's large language models to predict the likelihood of reoffending.
In the United States, the COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) system is already used to assess the risk of relapse, but the results of its work have been mixed. The system's algorithms predict the likelihood of repeat crimes based on factors such as the number of convictions, the severity of the offense, and the age of the defendant. In 2016, a ProPublica study found that the COMPAS system was more likely to identify black people as potential repeat offenders, even with similar criminal histories to members of other races.
To test the capabilities of artificial intelligence in predicting relapses, the researchers used a combined dataset from three sources. The first source is information from the COMPAS system, including a relapse risk rating scale, which ranges from 1 to 10 points. The second source is the results of a survey of 20 people who assessed the likelihood of repeat crimes. The third source is the Chicago Face Database, which contains photos of people of different genders, ethnicities, and ages. The analysis took into account factors such as gender, age, race, number of previous serious crimes and the severity of offenses.
Testing was carried out on four language models: GPT 3.5 Turbo, GPT 4o, Llama 3.2 90B and Mistral NeMo (12B). The scientists concluded that language models and the COMPAS system are not superior to humans in predicting relapses. However, AI models are more likely to make predictions than humans and the COMPAS system, although not always with greater accuracy. Unlike humans, the accuracy of language models' predictions decreases in the absence of information about race. In the presence of such data, the number of false positive forecasts is significantly reduced. Among all the models tested, the GPT 3.5 Turbo showed the highest accuracy.
When photos of the defendants from the Chicago Face Database were added to the analysis, the results of the AI models improved, but the accuracy was still lower than that of humans. One of the key discoveries was that language models are superior to both humans and the COMPAS system when they have access to the results of previous solutions through contextual learning.
Notably, the availability of information about race reduced the number of false positives for black and Hispanic defendants. The best results were achieved when both photographs of the defendants and information about their race were included in the analysis.
Source