Listen Up, NVIDIA Fugatto AI Music Model Turns Text Prompts Into Audio Creations

by Alan Velasco — Tuesday, November 26, 2024, 02:45 PM EDT

2024 has been a blockbuster year for NVIDIA as the company rides the AI wave, leading to a staggering market cap and taking Intel’s spot on the Dow Jones. The company continues to invest in software solutions that it hopes will make its hardware the top choice for AI tasks. Its latest effort is a Foundational Generative Audio Transformer Opus 1, or Fugatto, which will generate audio based on a text input.

After a user provides a prompt, Fugatto “generates or transforms any mix of music, voices and sounds described with prompts using any combination of text and audio files.” Some of the features of Fugatto include removing or adding instruments to an already existing track, modifying the accent or emotion present in vocals, or even inventing sounds that have not been heard before.

Manager of Applied Research at NVIDIA, Rafael Valle says that one of the goals was “to create a model that understands and generates sound like humans do.” Fugatto accomplishes this with its audio generation and transformation features, which the company claims make this model capable of “emergent properties.” Valle envisions it as the beginnings of the work needed to get to “a future where unsupervised multitask learning in audio synthesis and transformation emerges from data and model scale.”

While the team that worked on Fugatto sees it as another instrument artists can use in the creation process, there’s a good chance it will be used to quickly churn out music or sound effects on the cheap. With the addition of AI playlists to music services such as Spotify, how soon until the AI DJ presents a user with music generated solely by AI? Probably sooner than most would think as NVIDIA will likely pour more resources into improving Fugatto. The question is, will it be any good or, put into service when it’s simply good enough?