Meta deliberately released the SeamlessM4T AI model as open-source under a Creative Commons license to allow researchers and developers to build on this work.
Heaptalk, Jakarta — Meta introduced an AI model called SeamlessM4T that can translate and transcribe speech and text (08/22). This tool is claimed to be able to recognize automatic speech for nearly 100 languages.
SeamlessM4T capabilities include translation for speech-to-text, speech-to-speech, text-to-text, and text-to-speech. This AI model can implicitly recognize the source language without requiring a separate language identification model. The systems in this model can utilize large amounts of data and generally perform well for only one modality.
Meta leverages findings from all of the universal translator projects that have been deployed to build SeamlessM4T, enabling a multilingual and multimodal translation experience derived from a single model, built across a wide range of spoken data sources.
The company admitted that building a universal language translator is challenging since the existing speech-to-speech and speech-to-text systems cover only a small fraction of the world’s languages. In addition, Universal language translators also experience dependence on separate systems that divide speech-to-speech translation tasks into stages across subsystems. These challenges were what Meta tried to overcome when building SeamlessM4T.
“SeamlessM4T represents a significant breakthrough in the field of speech-to-speech and speech-to-text by addressing the challenges of limited language coverage and a reliance on separate systems, which divide the task of speech-to-speech translation into multiple stages across subsystems,” stated Meta on its official blog.
Released as open-source under a Creative Commons license
Further, Meta has deliberately released SeamlessM4T as open-source under a Creative Commons license to allow researchers and developers to build on this work. Additionally, the company unleashed SeamlessAlign metadata with a total dataset of 270,000 hours of mined speech and text alignments. Meta claimed that this is the largest open multimodal translation dataset thus far.
“We believe the work we’re announcing today is a significant step forward in this journey. Our single model provides on-demand translations that enable people who speak different languages to communicate more effectively,” added Meta.
In 2022, Meta introduced No Language Left Behind (NLLB), a text-to-text machine translation model that supports 200 languages. The model has been integrated into Wikipedia as one of its translation providers. In May 2023, the company published Massively Multilingual Speech, which provides automatic speech recognition, language identification, and speech synthesis technologies in more than 1,100 languages.