AudioCraft contains three models, spanning MusicGen, AudioGen, and EnCodec. Meta launched the tool as an open source for researchers to train their own models.
Heaptalk, Jakarta — Meta announced its latest AI tool that generates audio and music from text-based user inputs, namely AudioCraft (08/02). The tool contains three open-source models, spanning MusicGen, AudioGen, and EnCodec.
In more detail, MusicGen has the ability to produce music from text prompts as it has been trained with Meta-owned and specifically licensed music. AudioGen can create audio from the text after being trained on public sound effects, such as a dog barking, cars honking, or footsteps on a wooden floor. Meanwhile, EnCodec decoder allows higher-quality music generation with fewer artifacts.
The company launched AudioCraft’s weights and code as open source for research purposes. “We’re open-sourcing these models, giving researchers and practitioners access so they can train their own models with their own datasets for the first time, and help advance the field of AI-generated audio and music,” Meta stated on its official blog.
Turning into a new type of instrument such as the synthesizer
The development of this tool started from the lack of generative AI specifically for audio, while many have been launched for images, video, and text. Several parties have developed a similar model but Meta sees it is still complicated and not very open, making it difficult for the public to access.
To produce high-fidelity audio of any kind, Meta realized it needed complex modeling of signals and patterns at multiple scales. The company recognizes that music is the most challenging type of audio to produce since it is made up of local and distant patterns, ranging from tone strings to global musical structures with multiple instruments.
Further, MusicGen is expected to evolve into new types of instruments with more control, such as synthesizers. “Having a solid open-source foundation will foster innovation and complement the way we produce and listen to audio and music in the future. With even more controls, we think MusicGen can turn into a new type of instrument — just like synthesizers when they first appeared,” Meta concluded.