TechDogs-"NVIDIA Reveals Fugatto, A Text-To-Audio AI Model For Music, Voices And Sounds"

Media and Entertainment

NVIDIA Reveals Fugatto, A Text-To-Audio AI Model For Music, Voices And Sounds

By Amrit Mehra

TD NewsDesk

Updated on Wed, Nov 27, 2024

Overall Rating
In the constantly changing and consistently updating world of artificial intelligence (AI), industry leader NVIDIA is here with a new product that sounds too good to be true—but is in fact true.

In a new blog post published on its website, NVIDIA revealed its new generative AI (GenAI) text-to-audio AI model, Fugatto (short for Foundational Generative Audio Transformer Opus 1).

The model, which builds on NVIDIA’s prior work in speech modeling, audio vocoding, and audio understanding, was developed by a research team of over a dozen experts from around the globe, including India, Brazil, China, Jordan, and South Korea.

As per the company, the new tool serves as a Swiss Army Knife for sound, as it comes with a wide range of features that allow users to control the audio output using various controlling options. Its capabilities outdo other AI models used for song composition and voice modification.

It enables users to create music snippets based on text prompts, remove or add instruments from songs, modify the accent or emotion, and even produce sounds never heard before!

“This thing is wild,” said Ido Zmishlany, a multi-platinum producer and songwriter and member of the NVIDIA Inception program for cutting-edge startups. “Sound is my inspiration. It’s what moves me to create music. The idea that I can create entirely new sounds on the fly in the studio is incredible.”

“The history of music is also a history of technology. The electric guitar gave the world rock and roll. When the sampler showed up, hip-hop was born,” added Zmishlany. “With AI, we’re writing the next chapter of music. We have a new instrument, a new tool for making music — and that’s super exciting.”

Fugatto enables users to quickly prototype or edit ideas for songs, try out different styles, voices, and instruments, add effects, and enhance audio quality.

Essentially, the model can generate pretty much anything a user could describe, from making “a trumpet bark or a saxophone meow”.

Furthermore, with fine-tuning and small amounts of singing data, the model was found to handle tasks it was not pretrained on, such as generating a high-quality singing voice from a text prompt.

Such capabilities can help businesses personalize their ad campaigns for specific regions by using different accents and emotions in voiceovers. This includes converting songs, audiobooks, or online courses to sound like someone familiar, from celebrities to family members.

As for NVIDIA’s most associated sector, video games, developers can use Fugatto to modify prerecorded assets to adapt to gameplay or even create completely new sounds.

This capability extends to other users in the form of temporal interpolation, as per Rohan Badlani, one of the over a dozen AI researchers who worked on the model, This includes allowing users to create sounds that change over time, such as generating the sounds of a rainstorm moving through an area and slowly fading into the distance, while also allowing them to choose how the soundscape evolves.

“I wanted to let users combine attributes in a subjective or artistic way, selecting how much emphasis they put on each one,” said Badlani, adding, “In my tests, the results were often surprising and made me feel a little bit like an artist, even though I’m a computer scientist.”

NVIDIA also released a video showing what Fugatto can do.

TechDogs-"A Screenshot From NVIDIA's Fugatto Video Showing The Tool In Use"
Speaking about Fugatto, Bryan Catanzaro, VP of Applied Deep Learning Research at NVIDIA, said, “If we think about synthetic audio over the past 50 years, music sounds different now because of computers, because of synthesizers. I think that generative AI is going to bring new capabilities to music, to video games and to ordinary folks that want to create things.”

Catanzaro added, “Any generative technology always carries some risks, because people might use that to generate things that we would prefer they don't. We need to be careful about that, which is why we don't have immediate plans to release this.”

Do you think NVIDIA’s new text-to-audio AI model, Fugatto, will be able to challenge the existing tools already being used?

Let us know in the comments below!

First published on Wed, Nov 27, 2024

Liked what you read? That’s only the tip of the tech iceberg!

Explore our vast collection of tech articles including introductory guides, product reviews, trends and more, stay up to date with the latest news, relish thought-provoking interviews and the hottest AI blogs, and tickle your funny bone with hilarious tech memes!

Plus, get access to branded insights from industry-leading global brands through informative white papers, engaging case studies, in-depth reports, enlightening videos and exciting events and webinars.

Dive into TechDogs' treasure trove today and Know Your World of technology like never before!

Disclaimer - Reference to any specific product, software or entity does not constitute an endorsement or recommendation by TechDogs nor should any data or content published be relied upon. The views expressed by TechDogs' members and guests are their own and their appearance on our site does not imply an endorsement of them or any entity they represent. Views and opinions expressed by TechDogs' Authors are those of the Authors and do not necessarily reflect the view of TechDogs or any of its officials. While we aim to provide valuable and helpful information, some content on TechDogs' site may not have been thoroughly reviewed for every detail or aspect. We encourage users to verify any information independently where necessary.

Join The Discussion

- Promoted By TechDogs -

Join Our Newsletter

Get weekly news, engaging articles, and career tips-all free!

By subscribing to our newsletter, you're cool with our terms and conditions and agree to our Privacy Policy.

  • Dark
  • Light