Friend
Professional
- Messages
- 2,653
- Reaction score
- 850
- Points
- 113
Scientists from the AIRI Institute and MTUCI have proposed a model for detecting fake voices.
Researchers from the AIRI Institute and MTUCI have presented a new model for detecting fake voices called AASIST3. This architecture took a place in the top 10 best solutions at the international ASVspoof 2024 Challenge. The model is designed to protect against voice fraud and improve the security of systems that use voice authentication.
Voice biometrics (ASV) systems allow users to be identified by their voice. Such systems are used to authenticate financial transactions, manage access to smart devices, and protect against modern forms of telephone fraud.
Voice recognition models are vulnerable to attack when a small change in an audio file results in a significant distortion of the result, although it may not be noticeable to a human. Criminals use text-to-speech (TTS) and voice translation (VC) techniques to generate synthetic voices to bypass security systems. Effective protection requires the creation of models capable of detecting voice spoofing.
The AASIST AI model was proposed by scientists from South Korea and France in 2021 and showed high efficiency, but after the rapid development of generative AI in 2022, its functionality was not enough to identify synthetic voices. Based on AASIST, the AIRI and MTUCI team, in collaboration with a Skoltech PhD student, created an updated architecture to identify fake synthesized voices.
The use of the Kolmogorov-Arnold Network (KAN), additional layers, an improved feature extractor, and special training functions has more than doubled the performance of the model compared to the basic version. The new model also adapts better to new types of attacks.
Instead of relying only on classical methods, AASIST3 uses modern neural networks to counteract voice spoofing, given the context of voice data. This allows you to recognize fakes with high accuracy and protect yourself from new threats.
There are two ways to solve the problems of anti-spoofing: through binary classification, determining whether the voice is genuine or artificial, or in conjunction with a biometric system, where it is necessary to distinguish the voices of different speakers.
The studies were conducted iteratively, testing different hypotheses and improving key metrics such as t-DCF and EER. On the validation data, it was possible to achieve significant improvements compared to the original model, which confirms the effectiveness of the new architecture.
AASIST3 promises to be a useful tool in the financial sector and telecommunications to combat voice fraud and improve the security of voice authentication.
Source
Researchers from the AIRI Institute and MTUCI have presented a new model for detecting fake voices called AASIST3. This architecture took a place in the top 10 best solutions at the international ASVspoof 2024 Challenge. The model is designed to protect against voice fraud and improve the security of systems that use voice authentication.
Voice biometrics (ASV) systems allow users to be identified by their voice. Such systems are used to authenticate financial transactions, manage access to smart devices, and protect against modern forms of telephone fraud.
Voice recognition models are vulnerable to attack when a small change in an audio file results in a significant distortion of the result, although it may not be noticeable to a human. Criminals use text-to-speech (TTS) and voice translation (VC) techniques to generate synthetic voices to bypass security systems. Effective protection requires the creation of models capable of detecting voice spoofing.
The AASIST AI model was proposed by scientists from South Korea and France in 2021 and showed high efficiency, but after the rapid development of generative AI in 2022, its functionality was not enough to identify synthetic voices. Based on AASIST, the AIRI and MTUCI team, in collaboration with a Skoltech PhD student, created an updated architecture to identify fake synthesized voices.
The use of the Kolmogorov-Arnold Network (KAN), additional layers, an improved feature extractor, and special training functions has more than doubled the performance of the model compared to the basic version. The new model also adapts better to new types of attacks.
Instead of relying only on classical methods, AASIST3 uses modern neural networks to counteract voice spoofing, given the context of voice data. This allows you to recognize fakes with high accuracy and protect yourself from new threats.
There are two ways to solve the problems of anti-spoofing: through binary classification, determining whether the voice is genuine or artificial, or in conjunction with a biometric system, where it is necessary to distinguish the voices of different speakers.
The studies were conducted iteratively, testing different hypotheses and improving key metrics such as t-DCF and EER. On the validation data, it was possible to achieve significant improvements compared to the original model, which confirms the effectiveness of the new architecture.
AASIST3 promises to be a useful tool in the financial sector and telecommunications to combat voice fraud and improve the security of voice authentication.
Source