Meta AI introduces a "seamless" translator for communicating in different languages in real time

Brother

Professional
Messages
2,590
Reaction score
533
Points
113
A translator that will change the way we interact on a global level.

Meta AI researchers have announced that they have developed a new set of artificial intelligence models called Seamless Communication, which aims to provide more natural and authentic communication in different languages, effectively bringing to life the concept of a Universal Speech Translator. The models were unveiled this week along with research papers and related data.

The main model, Seamless, combines the capabilities of three other models-SeamlessExpressive, SeamlessStreaming and SeamlessM4T v2, creating a single system. According to the research paper, Seamless is "the first publicly available system that provides expressive cross-language communication in real time."

How Seamless Works as a universal real-time translator​

Seamless Translator introduces a new stage in the use of AI for communication. It combines three complex neural network models that allow you to translate into more than 100 spoken and written languages in real time, while preserving the vocal style, emotions and prosody of the speaker's voice.

SeamlessExpressive focuses on preserving the vocal style and emotional nuances of the speaker's voice when translating between languages. As stated in the paper, "translations should convey the nuances of human expression. While existing translation tools are good at conveying the content of a conversation, they usually rely on monotonous, robotic text-to-speech systems for output."

SeamlessStreaming provides almost instant translation with a delay of only about two seconds. The researchers say this is "the first large-scale multilingual model" to provide such fast translation speeds in nearly 100 spoken and written languages.

The third model, SeamlessM4T v2, serves as the basis for the other two models. This is an improved version of the original SeamlessM4T model released last year. The new architecture provides "improved consistency between text and speech output," according to the paper.

"Overall, Seamless provides us with a key insight into the technical framework needed to transform a Universal Speech Translator from a sci-fi concept to a real technology," the researchers wrote.

Potential for transforming global communication​

The developed models open the way to innovative voice communications: from real-time conversations in multiple languages using smart devices, to automatically translated videos and podcasts. Such technologies can greatly facilitate the lives of immigrants and all those who face language barriers in communication, opening up new opportunities for inclusive interaction.

"By publishing our work, we hope that researchers and developers will be able to expand the impact of our contributions, creating technologies aimed at overcoming multilingual connections in an increasingly interconnected and interdependent world," the paper says.

However, the researchers acknowledge that the technology can also be used for harmful purposes, such as voice phishing attacks, creating fake videos, and other malicious applications. To promote safety and responsible use of the models, they have implemented several measures, including audio-guided signs and new techniques to reduce hallucinated toxic inferences.

Models are publicly available on Hugging Face​

In line with Meta's commitment to open research and collaboration, Seamless Communication models have been published on Hugging Face and Github.

The collection includes Seamless, SeamlessExpressive, SeamlessStreaming, and SeamlessM4T v2 models along with associated metadata.

By opening up access to its innovative natural language processing models, Meta aims to inspire researchers and developers to further develop and improve these technologies. The goal is to create a bridge between different languages and cultures, improving global understanding. This step not only confirms Meta's position as a leader in open AI technologies, but also provides the research community with a valuable and relevant resource.
 
Top