Hello,
Does Magenta plan on releasing a dataset of music for the public? Coming up with new models and results is great and all, but without a public and common dataset, the greater community cannot a) verify Magenta's results are correct/possible or b) make improvements to their models that are not purely qualitative or dubious for the same reasons. It's quite difficult for independent developers such as myself to go around the internet scraping midi files, while the Magenta team has amassed some 1.5 million midi files (I had no idea that many good midi files even existed).
For reference, I'm aware the NSynth Dataset has been released, and I also know of the Lakh and Million Song Datasets. I'm mainly asking about midi compositions, and the previous 2 datasets are seriously lacking in a good amount of well labeled [classical] pieces.
If there are copyright or ownership issues then that's understandable, but in general I think the field of music generation has suffered from a good dataset for people to train on. Any input from the Magenta team on this would be appreciated. Thanks!
As a side note, a few months ago I worked on generating music using GANs. Hoping to get around to a bunch of different model ideas I've had in mind over the next few months. Here's a sample generation from the GAN on my soundcloud:
https://soundcloud.com/faraazn/gan-generation/s-YEdEz
Best,
Faraaz