Magenta Dataset?

316 views
Skip to first unread message

Faraaz Nadeem

unread,
May 29, 2018, 11:29:41 PM5/29/18
to Magenta Discuss
Hello,

Does Magenta plan on releasing a dataset of music for the public? Coming up with new models and results is great and all, but without a public and common dataset, the greater community cannot a) verify Magenta's results are correct/possible or b) make improvements to their models that are not purely qualitative or dubious for the same reasons. It's quite difficult for independent developers such as myself to go around the internet scraping midi files, while the Magenta team has amassed some 1.5 million midi files (I had no idea that many good midi files even existed).

For reference, I'm aware the NSynth Dataset has been released, and I also know of the Lakh and Million Song Datasets. I'm mainly asking about midi compositions, and the previous 2 datasets are seriously lacking in a good amount of well labeled [classical] pieces.

If there are copyright or ownership issues then that's understandable, but in general I think the field of music generation has suffered from a good dataset for people to train on. Any input from the Magenta team on this would be appreciated. Thanks!

As a side note, a few months ago I worked on generating music using GANs. Hoping to get around to a bunch of different model ideas I've had in mind over the next few months. Here's a sample generation from the GAN on my soundcloud: https://soundcloud.com/faraazn/gan-generation/s-YEdEz

Best,
Faraaz

giancarlo iannizzotto

unread,
May 30, 2018, 1:27:51 AM5/30/18
to Magenta Discuss
Hello,
I'm trying to learn more about Magenta and its models.

Is there any graph, scheme, brief description or any other clear and complete description of the polyphony_rnn model?

I tried to follow the code but could not fully understand it.
Sorry, I know this sound like a dumb question.


--
Magenta project: magenta.tensorflow.org
To post to this group, send email to magenta...@tensorflow.org
To unsubscribe from this group, send email to magenta-discuss+unsubscribe@tensorflow.org
---
You received this message because you are subscribed to the Google Groups "Magenta Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discuss+unsubscribe@tensorflow.org.

Kyle Kastner

unread,
May 30, 2018, 6:42:18 AM5/30/18
to giancarlo iannizzotto, Magenta Discuss
On polyphony_rnn, there was a blog here: https://magenta.tensorflow.org/performance-rnn

You can also probably find reading on BachBot useful, the modeling setup is similar in some ways besides the "non-quantization" polyphony_rnn uses. 





For datasets of classical type music I generally prefer things stored as musicxml or Humdrum - I find the datasets a lot more consistent to parse and clean than most large midi dumps

Some examples:

The musicXML corpus of Bach, directly included in the python package: http://web.mit.edu/music21/doc/about/referenceCorpus.html




Less classical/classical mixed with other genre, but still cool datasets:

Weimar Jazz:

Finnish folk:

MAPS link posted earlier to the list:

Some reddit dumps:


I will re-recommend Lakh midi here, I like that dataset a lot. 

or some various cleaned subsets:


A collection of various MIDI from Dr. Robert Haralick: http://www.haralick.org/ML/k_collection.zip

Kyle Kastner

unread,
May 30, 2018, 7:32:03 AM5/30/18
to giancarlo iannizzotto, Magenta Discuss
Just found this related paper as well for datasets: https://arxiv.org/abs/1606.02542

Adam Roberts

unread,
May 30, 2018, 2:17:46 PM5/30/18
to Kyle Kastner, giancarlo iannizzotto, Magenta Discuss
We unfortunately cannot release our full scraped dataset. 

However, we have found that the Lakh MIDI Dataset produces very similar results on MusicVAE and other models, and we will be releasing benchmark results on LMD in the final version of the MusicVAE paper for ICML. 

We are working to release specific subsets of our MIDI data in the near future so stay tuned!

To unsubscribe from this group, send email to magenta-discu...@tensorflow.org

---
You received this message because you are subscribed to the Google Groups "Magenta Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discu...@tensorflow.org.

--
Magenta project: magenta.tensorflow.org
To post to this group, send email to magenta...@tensorflow.org
To unsubscribe from this group, send email to magenta-discu...@tensorflow.org

---
You received this message because you are subscribed to the Google Groups "Magenta Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discu...@tensorflow.org.


--
Magenta project: magenta.tensorflow.org
To post to this group, send email to magenta...@tensorflow.org
To unsubscribe from this group, send email to magenta-discu...@tensorflow.org

---
You received this message because you are subscribed to the Google Groups "Magenta Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discu...@tensorflow.org.
Reply all
Reply to author
Forward
Message has been deleted
0 new messages