Skip to content
3375b.com
Menu
  • Home
  • News
  • About Us
  • Privacy Policy
  • Terms Of Use
  • Sitemap
  • Contact
Menu

Meta Open-Sources Multilingual Translation Foundation Model SeamlessM4T

Posted on September 27, 2023

Meta recently open-sourced Massively Multilingual & Multimodal Machine Translation (SeamlessM4T), a multilingual translation AI that can translate both speech audio and text data across nearly 100 languages. SeamlessM4T is trained on 1 million hours of audio data and outperforms the current state-of-the-art speech-to-text translation model.

SeamlessM4T is a multimodal model that can handle both text and audio data as input and output, allowing it to perform automated speech recognition (ASR), text-to-text translation (T2TT), speech-to-text translation (S2TT), text-to-speech translation (T2ST), and speech-to-speech translation (S2ST). The model is released under the non-commercial CC BY-NC 4.0 license. Meta is also releasing their training dataset, SeamlessAlign, which contains 270,000 hours of audio data with corresponding text transcription, as well as their code for mining the data from the internet. According to Meta,

We believe the work we’re announcing today is a significant step forward….Our single model provides on-demand translations that enable people who speak different languages to communicate more effectively. We significantly improve performance for the low and mid-resource languages we support. These are languages that have smaller digital linguistic footprints….This is only the latest step in our ongoing effort to build AI-powered technology that helps connect people across languages. In the future, we want to explore how this foundational model can enable new communication capabilities—ultimately bringing us closer to a world where everyone can be understood.

Meta’s motivation for their research is to build a universal translation system like the Babelfish from The Hitchhiker’s Guide to the Galaxy sci-fi stories. InfoQ has covered several of their previous efforts, including their T2TT model No Language Left Behind (NLLB) which can translate text between 200 languages, and their Massively Multilingual Speech (MMS) model which supports ASR and text-to-speech synthesis (TTS) in over 1,100 languages. InfoQ also covered other work in the area, such as OpenAI’s Whisper which can transcribe and translate speech audio from 97 different languages, Google’s Universal Speech Model (USM) that supports ASR in over 100 languages, and Google’s AudioPaLM, which was the previous state-of-the-art model for S2ST.

SeamlessM4T is based on the UnitY neural network architecture, which consists of a pipeline of three components. First is an encoder that can handle both speech audio and text data input and recognizes the input’s meaning; the audio sub-component is based on w2v-BERT and the text on NLLB. Next is a decoder, also based on NLLB, which converts that meaning into a text output in a target language. Finally, there is a text-to-acoustic unit decoder to convert the target text into speech.

Meta compared their model’s performance to both cascaded approaches, which consist of a pipeline of discrete ASR, T2TT, and TTS models, and to single-model systems. The systems were evaluated on the FLEURS and CVSS benchmarks. On FLEURS, SeamlessM4T “sets a new standard for translations into multiple target languages,” outperforming AudioPaLM by 20%. SeamlessM4T also outperformed cascaded models; on CVSS it was “stronger by 58%.”

Several users discussed SeamlessM4T on Hacker News. One user shared tips on how to get the model to run locally, and pointed out that it had a context limit of 4096 tokens. Another user asked:

Will there be a whispercpp equivalent? Half the reason I love whisper is how dead simple it is to get running. I will take somewhat lower accuracy for easier operation.

The SeamlessM4T code and models are available on GitHub. There is an interactive translation demo available on Huggingface.

Recent Posts

  • Twilio lays off 5% of employees in unit activists want to divest
  • Forest City: Inside Malaysia’s Chinese-built ‘ghost city’
  • Materials Handling Automation: Automation at the forefront of innovation
  • Slaughter and May boss swaps law for public relations firm Brunswick
  • LIVE: Alaska, Hawaiian airlines hold news conference on planned $1.9B merger

Archives

  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • April 2023
  • June 2022
  • January 2022
  • November 2021
  • September 2021
  • May 2021
  • January 2021
  • November 2020
  • May 2020
  • July 2018
  • June 2018
  • December 2017
  • January 2016
  • December 2015
  • July 2015
  • June 2015
  • January 2015
  • December 2014
  • January 2013
  • August 2010
  • March 2009
  • February 2009
  • September 2008
  • December 2007
  • October 2007
  • August 2007
  • July 2007
  • February 2006
  • March 2003
  • January 2003
  • September 2002
  • October 2001

Categories

Jasa Backlink Murah

Links

55slot

©2023 3375b.com | Design: Newspaperly WordPress Theme