Preloader

Contributions

Music Generation and Understanding

We introduce the M2UGen framework, an advancement capable of simultaneously encompassing music understanding and multi-modal music generation tasks, aiming to assist users in music related artistic creation.

Music Oriented Datasets

We propose a systematic approach for generating large multi-modal music oriented instruction datasets for the training of our M2UGen model including the largest publicly available music captioning dataset containing 1.2k plus hours of music using the MU-LLaMA model.

Music Oriented Datasets

We use the MU-LLaMA Model along with MosaicML's MPT-7B-Chat Model to generate our Music Oriented datasets.
We also utilize BLIP Image Captioning Model and VideoMAE Captioning Model to caption the respective modalities.

Music File   MU-LLaMA Model   Music Captions

Image File   BLIP Image Captioning Model   Image Captions

Video File   VideoMAE Captioning Model   Video Captions

MUCaps Dataset

Human:
M2UGen:
The music is a fast-paced and energetic instrumental rock song with a heavy guitar riff and drums.
Human:
M2UGen:
The music is a fast-paced techno song with a strong beat and electronic elements.
Human:
M2UGen:
The music is a slow and soothing melody with a steady rhythm. The instruments used are drums, cymbals, and a flute. The tempo is slow and steady, creating a calming atmosphere. The mood of the song is peaceful and serene, evoking a sense of relaxation and tranquility.
Human:
M2UGen:
The music is a mix of electronic and ambient sounds, with a focus on synthesizers and drums. The tempo is slow and the mood is calm and soothing.

MUImage Dataset

Image Captions   Music Captions   MPT-7B Model   MUImage Instructions

Human:
Generate a music for the image that is ethereal and spacey, with a focus on the synthesizers and bass, and a hint of a pulsing beat.
M2UGen:
Here is a music that is space-themed and could fit well with the image of a bright star in the center of a galaxy.
Human:
Generate a music for the image that is full of energy and enthusiasm, just like the folk song. Use the instruments shown in the image to create a lively and upbeat tune that will get the audience dancing and singing along.
M2UGen:
Here is a music that is perfect for a lively and upbeat folk song with a catchy melody.
Human:
Generate a music for the image that is slow, melancholic, and minimalist with a conductor leading the orchestra.
M2UGen:
Here is a music that is reminiscent of the slow and melancholic atmosphere of minimalist music, with a focus on a single conductor leading the orchestra.
Human:
Generate a music for the image that is reminiscent of the man's emotions, capturing the sense of peace and tranquility in the living room.
M2UGen:
Here is a music that is perfect for relaxing and unwinding, similar to the image of a man playing a guitar in a cozy living room.

MUVideo Dataset

Video Captions   Music Captions   MPT-7B Model   MUVideo Instructions

Human:
Generate a music for the video that is traditional and slow-paced, with a focus on the violin and the man's emotions as he plays.
M2UGen:
Here is a music that is reminiscent of a classical piece, perfect for a scene like this where a person is focused on playing an instrument.
Human:
Generate a music for the video that is slow, soothing, and has a steady rhythm like the guitar's strumming.
M2UGen:
Here is a music that is perfect for meditation or relaxation.
Human:
Generate a music for the video that is fast-paced and energetic, with a strong beat and aggressive guitar riffs.
M2UGen:
Here is a music that is perfect for a high-energy and aggressive game like the one in the video, with a strong beat and aggressive guitar riffs to match the intensity of the game.
Human:
Generate a music for the video that is fast-paced and upbeat, with a catchy rhythm like the one in the instrumental, but with a focus on the drums and piano in the video.
M2UGen:
Here is a music that is perfect for a dance party or a lively performance.

MUEdit Dataset

Music Captions   Music Captions   MPT-7B Model   MUEdit Instructions

Human:
To change the input music to the target music, you can slow down the tempo and add more percussion instruments to create a steady and rhythmic beat. You can also incorporate more melodic elements, such as a keyboard or strings, to create a more balanced sound.
M2UGen:
Here is a music that is a slow and steady beat accompanied by a variety of percussion sounds, including drums, cymbals, and a bassline.
Human:
To change the input music to the target music, you can use the following instruction: "Replace the guitar and bass with a piano and add a layer of strings to create a more emotional and intricate melody."
M2UGen:
Here is a music that is a slow, melancholic, and emotional piece with a rich and warm sound, delicate and intricate layers of melody, and a deep and resonant foundation provided by the cello.
Human:
Add Latin American flavor to the input music by incorporating an accordion, keyboard, and bass guitar, while keeping the fast-paced, upbeat tempo and simple drum beat. Also, consider adding a tambourine for a playful element.
M2UGen:
Here is a music that is a fast-paced, upbeat salsa song with a catchy melody and lively rhythm. The accordion provides a traditional Latin American flavor, while the keyboard and bass guitar add a modern touch. The rhythm is provided by the acoustic drum beat, which is simple yet effective.
Human:
Replace the drum solo with a fast-paced and upbeat composition with a catchy melody and rhythm.
M2UGen:
Here is a music that is a fast-paced, upbeat, and lively composition with a catchy melody and rhythm.

Music Oriented Dataset Stats

MUCaps Dataset

0

Music Files

0

Hours of Music

MUImage Dataset

0

Image-Music Pairs

0

Hours of Music

MUVideo Dataset

0

Video-Music Pairs

0

Hours of Music

MUEdit Dataset

0

Music Pairs

0

Hours of Music

Multi-modal Music Understanding and Generative model (M2UGen)

The M2UGen model is a Music Understanding and Generation model that is capable of Music Question Answering and also Music Generation from texts, images, videos and audios, as well as Music Editing. The model utilizes encoders such as MERT for music understanding, ViT for image understanding and ViViT for video understanding and the MusicGen/AudioLDM2 model as the music generation model (music decoder), coupled with adapters and the LLaMA 2 model.

M2UGen Model Architecture

M2UGen's Demo Video

This is the demo video of our model's capabilities for Music Generation, Understanding and Editing.

M2UGen's Text-to-Music Generation Demo

This is the interactive demo of our model's capability for Text-to-Music Generation and Music Understanding.

Generate a music perfect for a rock concert
The music is a fast-paced, intense and energetic metal song with a lot of distortion and aggression.
List the instruments in the music.
The instruments used in the song are guitar, drums and bass.
Generate a country music played using an acoustic guitar
Here is an upbeat country music using a guitar.
What can you infer about the artist's mood from this song ?
The artist is likely feeling passionate and energetic, as the song has a fast tempo and upbeat rhythm.
What is the genre of this music ?
The genre of this music is country.
What is the artist trying to convey in this music ?
The artist is trying to convey a sense of energy, passion, and emotion in this music.
Generate an electronic music perfect for dancing
Here is an electronic music with a groovy and eccentric vibe.
Describe the generated music in detail.
The music is characterized by a repetitive and catchy melody, with a fast tempo and a repetitive rhythm. The instruments used are a keyboard, a bass, and electronic drums. The sound quality is clear and the melody is simple, making it easy to listen to and enjoy.
Where can you use this music?
This music can be played in nightclubs, dance clubs, discotheques, and bars, as well as in private parties, DJ sets, and car rides.
Generate a beat for a marching band
Here is a music with a strong beat and a steady rhythm for a marching band.
What is the main instrument used ?
The main instrument used is drums.
What other instruments are used ?
The song also features cymbals and a tambourine.

M2UGen's Text-to-Music Generation and Music Editing Demo

This is the interactive demo of our model's capability for Text-to-Music Generation, Music Understanding and Music Editing.

Generate an electronic music track
Here is a modern electronic instrumental track.
What is the tempo of this music?
The tempo is medium.
Speed up the tempo of the music and make it catchy
Here is a fast-paced electronic dance music track.
What is the tempo of this music?
The tempo is fast.
Where can this music be played ?
This music can be played in a club or a dance party.
Generate a solo violin music
Here is a slow violin melody.
Replace the violin with an acoustic guitar
Here is a guitar solo with a pedal effect.
What is a pedal effect ?
A pedal effect is a type of effect that is created by using a pedal to change the tone or sound of the guitar.
Replace the guitar with a drum
Here is a drum solo with various percussion hits.
What is the mood of the music ?
The mood of the music is intense and powerful, with a strong sense of energy and movement.

M2UGen's Image-to-Music Generation Demo

This is the interactive demo of our model's capability for Image-to-Music Generation and Music Understanding.

Generate a music for this image
Here is a metal music performed by a band.
Describe the vibe of the music
The music is a powerful and intense dance music with a loud, vigorous, boisterous, youthful, intense, and upbeat vibe.
Generate a music with the instrument in the image
Here is an upbeat and energetic music using a guitar.
What can you tell me about the music?
The music is a fast-paced instrumental with a guitar and drums playing the main melody and rhythm.
What is the mood of the music?
The mood of the music is upbeat, intense, buoyant, spirited, youthful, vibrant, funky, boisterous and vigorous.

M2UGen's Image-to-Music Generation and Music Editing Demo

This is the interactive demo of our model's capability for Image-to-Music Generation, Music Understanding and Music Editing.

Generate a music perfect for the old image
Here is a music that is a slow and melancholic guitar solo, reminiscent of a sad and nostalgic mood.
Add more instruments to the music and make it slightly more upbeat keeping the same tempo
Here is a slow and soothing instrumental piece with a calming effect.
List the instruments used
The music features a piano, bass, and drums.
Generate a music for this image
Here is a eerie and haunting music.
Generate a more upbeat music
Here is an upbeat and eerie music.
Describe the music in detail
The music is characterized by a slow and steady beat, with the drums providing the foundation for the rest of the instruments to build upon. The synth adds a layer of texture to the music, while the bass provides a steady rhythm. The music is also characterized by a sense of tension and release, with the drums building up to a climax before dropping back down to a slow and steady beat.

M2UGen's Video-to-Music Generation Demo

This is the interactive demo of our model's capability for Video-to-Music Generation and Music Understanding.

Generate a melody perfect for the mood of the video
Here is a slow, melodic piece with a gentle melody.
What is the mood of the music?
The mood of the music is calming and relaxing.
What are the instruments used ?
The instruments used are synth, bass, and a small string section.
Generate a music with the instrument in the video
Here is a solo drum beat.
Describe the music
The music is fast, groovy, punchy, and buoyant.
What is the main instrument used ?
The main instrument used is a drum set.
What is the mood of the player?
The mood of the player is aggressive and energetic.

M2UGen's Video-to-Music Generation and Music Editing Demo

This is the interactive demo of our model's capability for Video-to-Music Generation, Music Understanding and Music Editing.

Generate a music for the video perfect for a club
Here is an upbeat electronic music for a club.
Describe the music in detail
The music is characterized by its fast tempo, upbeat rhythm, and electronic elements. The electronic elements give the music a modern and futuristic feel. The music is upbeat and energetic, with a catchy melody that is sure to get stuck in your head.
Speed up the tempo and add more beats to the music
Here is a fast upbeat electronic music with a focus on synthesizers and percussion.
Generate a music with the instruments in the video
Here is fast and intense music with a strong beat and percussion.
Make the music more upbeat and lively perfect for dancing
Here is a drum beat with fast tempo and complex rhythms.
What is the mood of the music ?
The mood of the music is energetic, groovy, buoyant, vivacious, and vibrant.
Where can I play this music ?
This music can be played in a club.

M2UGen's Music Editing Demo

This is the interactive demo of our model's capability for Music Editing and Music Understanding.

Speed up the piano music and make it upbeat
Here is a fast piano music with a catchy melody.
Replace the piano with an acoustic guitar
Here is a fast music played with guitar and strings.
What is the main instrument in the music ?
The main instrument in the music is the flute.
Replace the flute in the music with an acoustic guitar
Here is a slow and relaxing piece that evokes a sense of calmness and tranquility.
What is the main instrument in the music ?
The main instrument in this music is the guitar.
Remove the guitar from this music
Here is a techno dance song with a fast tempo and vigorous drumming.
Is there a guitar in this music ?
No, there is no guitar in this music.
What is the tempo of this music ?
The tempo of this music is medium.
Speed up the tempo of the music and add a repetitive melody
Here is a fast and energetic electronic dance music track with a repetitive and catchy beat.
What is the tempo of this music ?
The tempo of this music is fast.
What is the mood of the artist ?
The mood of the artist is energetic and groovy.