Thoughts and issues with the Nsynth dataset

129 views

Skip to first unread message

Leonardo O. Gabrielli

unread,

May 22, 2017, 4:33:34 PM5/22/17

to Magenta Discuss

Hi all,

I've been using the NSynth dataset for a couple of weeks now. Great job indeed, especially in quantitative terms. I see, however, some drawbacks that should be addressed to remove some issues that I consider biasing terms in a deep ML research. I don't see ML research in music can benefit brute force methods used in the image field, at least until more robust representation ML algorithms or DSP transform are found, that let NN understand what is good as input and what not.

- Some of the samples are white noise. I don't know the reason, maybe corruption at some point of processing (I see e.g. they have a reasonable fade out). (e.g. brass acoustic 046 from note 84 on)

- Some of the "synthetic" sounds are almost useless. E.g. synthetic flutes are just vaguely resembling a flute and are more of a Wendy Carlos synth sound. Synthetic voices are rarely voice-like. I think using the whole dataset may largely bias ML usages, e.g. in instrument recognition. On the other hand filtering away these stuff requires a lot of listening (if timbre is what you are interested in). In general if one is concerned with timbre quality I suggest opting out all "synthetic" stuff (except for the synth leads :) )

- The interpolation of some sounds to cover ranges not existing in the original instrument are altering the sound character (see e.g. guitar-acoustic-010 notes below 40). These should be probably avoided. I assume ranges have been extended for the purpose of enlarging the database. This could be avoided, and it may be part of the process of the researcher to create variations or interpolate the data to enlarge the dataset or make it more robust.

- a lot of instruments at velocity 25 are unnaturally damped, often with some overshoot. I suggest this to be revised: I think the different versions of a same key may have been obtained by the dataset creators by applying a low pass filter with a bad Q coefficient which may create resonances. BTW: a lot of instruments do have lowpass filtering to create different velocities. I don't see the point in enhancing the dataset so unnaturally, unless one really needs it. See the point above.

- Knowing the sound sources employed would improve our usage (e.g. VSTs? samples from PC libraries? Or from hardware keyboards/arrangers? Or from sampling session been done by the google guys?)

Besides this, the existence of such a work is much appreciated. Last year I actually had in mind to produce a similar dataset at my institution, but at this point I doubt it would be of use, unless other people are seeing the nsynth to have additional drawbacks that cannot be addressed.

Best regards

Leonardo Gabrielli, PhD

Università Politecnica delle Marche

Jesse Engel

unread,

May 22, 2017, 9:57:42 PM5/22/17

to Leonardo O. Gabrielli, Magenta Discuss

Hi Leonardo,

Thanks for the thoughtful comments.

Some quick responses:

First and foremost we did not alter or augment any of the sounds besides downsampling to 16khz. Any percieved filtering or pitch shifting you're hearing was in the audio to begin with.

The white noise trumpet beyond pitch 84 something we recently became aware of, and may remove from the data set at a future point. Although the number of white noise samples I believe is very small compared to the rest of the dataset.

We did our best to provide accurate labels to the sounds we had and believe we do more good than harm by providing them even if some of the sounds are biased towards unrealistic, very synthetic timbres.

Of course, as a researcher you are free to use a subset of the data depending on the research and task you're trying to accomplish.

Thank you for recording your observations for others to learn from, and I'd encourage everyone to do the same if they find things they think would be helpful to know about.

All the best,

Jesse

--
Magenta project: magenta.tensorflow.org
To post to this group, send email to magenta...@tensorflow.org
To unsubscribe from this group, send email to magenta-discuss+unsubscribe@tensorflow.org
---
You received this message because you are subscribed to the Google Groups "Magenta Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discuss+unsubscribe@tensorflow.org.

Silvan Laube

unread,

Oct 23, 2023, 10:28:01 AM10/23/23

to Magenta Discuss, jesse...@google.com, Magenta Discuss, leonardo.o...@gmail.com

Hi All

I also noticed the white-noise-like samples.. IMO they should be removed entirely (or replaced with the correct sound if the noise is due to processing errors). I'd rather mix some noise in a controlled way if needed. Unlike with other points that were raised, they can't be identified from their labels and skipped easily, so removing or replacing them would be the best here.

Since this hasn't happend I don't think it ever will, and I also didn't find an existing cleaned subset I'll probably have to create my own version (of course, if anyone knows of an existing clean set, I'd be happy to use it instead).

As for sound interpolation guitar: The paper doesn't state anything about including interpolated sounds in the dataset, and Jesse confirmed they didn't alter anything except the sampling rate, so they are just part of the library that was used (and even the white-noise might come from the libs themselves...). Whether or not they just pitch-shifted them or how they recorded which notes is most likely unknown, although I would be interested in it too.

My 2 cents on the other points:

- There absolutely is sense in having a guitar going below 40. Think of (7-string) Standard B tuning etc. (although admittedly in that case going as low as 21 is a bit extreme..) and: if one wishes to exclude them it's pretty easy. Just restrict the notes used based on the instrument family.

- Same for flutes that sound unrealistic: They are still useful for pitch detection hence they are still useful. Exclude or re-tag them as synth instrument if you prefer doing so for instrument classification.

- The damping seems quite extreme to me too, at least in some cases, not just for low velocities - but again, I can just cut it off if I want to.

cheers,

Silvan

To unsubscribe from this group, send email to magenta-discu...@tensorflow.org

---
You received this message because you are subscribed to the Google Groups "Magenta Discuss" group.

To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discu...@tensorflow.org.

Reply all

Reply to author

Forward

0 new messages