Google’s new AI model can turn text into music

[ad_1]

Google researchers have created an AI that can generate musical pieces from text inputs – similar to how ChatGPT can turn a text command into a story and DALL-E generates images from written prompts. The AI program can turn text input into seconds, and even minutes-long music, as well as turn, hummed melodies into other instruments.
As per research published on Github, the AI model is called MusicLM, and the company has uploaded a string of samples that it produced using the model. The samples are called MusicCaps and are basically a dataset composed of 5.5k music-text pairs, with rich text descriptions provided by human experts.
“We introduce MusicLM, a model generating high-fidelity music from text descriptions such as ‘a calming violin melody backed by a distorted guitar riff’. MusicLM casts the process of conditional music generation as a hierarchical sequence-to-sequence modeling task, and it generates music at 24 kHz that remains consistent over several minutes,” the company said in the research published.
Google’s AI creates 5-minute melodies
The examples include 30-second clips as well as long-form music of 5 minutes that sound like actual songs. They have been created by paragraph-long descriptions, and the more clear the instructions are, the better the music is. Furthermore, the examples also include genre, vibe, and even specific instruments.
“The audio is generated by providing a sequence of text prompts. These influence how the model continues the semantic tokens derived from the previous caption,” the researchers said.
Story Mode
There is also a “story mode” demo where the model is basically given multiple text inputs with time duration for each type of music that needs to be created.
Take this prompt, for example:
time to meditate (0:00-0:15)
time to wake up (0:15-0:30)
time to run (0:30-0:45)
time to give 100% (0:45-0:60)
“Our experiments show that MusicLM outperforms previous systems both in audio quality and adherence to the text description. Moreover, we demonstrate that MusicLM can be conditioned on both text and a melody in that it can transform whistled and hummed melodies according to the style described in a text caption,” the researchers noted.



[ad_2]

Source link


Leave a Reply

Your email address will not be published. Required fields are marked *