Onset Detection Questions

143 views
Skip to first unread message

pavl...@gmail.com

unread,
May 28, 2017, 3:18:50 PM5/28/17
to madmom-users
Hi there! First of all, thank you for your great work! I am building an automatic guitar audio transcription web app, and I just discovered madmom.

I'd like to ask you some question about some parameters I was unsure about. Specifically, in:
madmom.features.onsets.OnsetPeakPickingProcessor

I was wondering how does the frames per second translate into sampling rate. I use a sampling rate of 44100, what should I put as a fps parameter in the constructor?

Also, what exactly does the threshold measure? Is it amplitude in dB?

Thanks!

pavl...@gmail.com

unread,
May 28, 2017, 3:20:33 PM5/28/17
to madmom-users, pavl...@gmail.com
Just a follow-up, as I am using this specifically for guitar, do you have any tips/recommendations which will suit guitar audio (I am just doing monophonic audio for now)?

Thanks

Sebastian Böck

unread,
May 29, 2017, 12:03:19 AM5/29/17
to madmom-users, pavl...@gmail.com
Hi,

following the docstring example, you should supply an onset detection function (ODF). This can be obtained by different means, e.g. by an RNN (madmom.features.onsets.RNNOnsetProcessor), CNN (madmom.features.onsets.CNNOnsetProcessor), or simple signal processing means (madmom.features.onsets.SpectralOnsetProcessor).

The threshold parameter has no unit per se, it depends on which ODF you supply. In case of neural network based ODFs, these can be interpreted as probabilities. Thus the threshold parameter translates to "give me all onsets the network thinks it is an onset with at least this probability". For spectral based ODFs, you have to play around a bit. Please have a look in the /bin folder for example programs and their default settings for various parameters. These are usually a good starting point for experimentation.

If you pass in raw audio signal (which is possible), you could set fps to the signal rate of the signal. However, this is not intended and will not give satisfying results. You should at least square the signal after rescaling it to float values. madmom has also functions/classes to accomplish this, but as I said previously, this is most likely not what you want.

HTH

Sebastian Böck

unread,
May 29, 2017, 12:04:57 AM5/29/17
to madmom-users, pavl...@gmail.com
Sorry, forgot to answer the second question: no I don't have any special recommendations for guitar music.

pavl...@gmail.com

unread,
May 29, 2017, 12:48:17 PM5/29/17
to madmom-users, pavl...@gmail.com
Thanks, this definitely helped. Could you give a bit more detail about how I should pass the signal to the OnsetProcessor? Right now I do:

  proc = madmom.OnsetPeakPickingProcessor(fps=300, threshold=10, pre_max=1. / 300., post_max=1)


  sodf
= madmom.SpectralOnsetProcessor(onset_method='superflux', fps=300,
    filterbank
=LogarithmicFilterbank, num_bands=24, log=np.log10)(filename)


So I am just passing an unprocessed signal with its filename. Should I not pass an fps at all? My desired sampling rate is 44100.

pavl...@gmail.com

unread,
May 29, 2017, 1:43:17 PM5/29/17
to madmom-users, pavl...@gmail.com
Furthermore, as some of my samples are very "loud" compared to others, I would like to somehow normalise them before processing. Is madmom.audio.signal.normalize(signal) the right way to do that? If yes, I would then like to use that normalized signal to find the onset times and not the filename. Is there a way to pass a signal instead of a filename to the OnsetProcessor?

Sebastian Böck

unread,
May 29, 2017, 1:50:25 PM5/29/17
to madmom-users, pavl...@gmail.com
I am not sure what you mean by "desired sampling rate". Usually the sampling rate is given by the audio signal/file. During STFT, the signal is split into overlapping frames with the frame rate given in frames per second via the fps parameter. Unless you have a very strong reason to use 300 fps, I suggest to stick with the default 200 fps for SuperFlux. As a side note, the post_max peak-picking parameter you chose seems a bit high if you don't expect notes to be at least 1 second apart from each other.

Sebastian Böck

unread,
May 29, 2017, 1:57:46 PM5/29/17
to madmom-users, pavl...@gmail.com
Yes, you can pass a Signal instance instead of a file name. SpectralOnsetProcessor accepts basically anything "down" the signal processing pipeline. But if you just want to have all audio files normalised it is enough to instantiate SpectralOnsetProcessor with norm=True.

Pavlos Kosmetatos

unread,
May 29, 2017, 1:59:22 PM5/29/17
to Sebastian Böck, madmom-users
Thanks a lot. Is there somewhere I can see all the possible kwargs with their meaning?

Sebastian Böck

unread,
May 29, 2017, 2:12:38 PM5/29/17
to madmom-users, sebastian...@gmail.com, pavl...@gmail.com
Unfortunately no. Right now the only way to see/discover which kwargs can be used is to examine the processor in question and follow the processing chain. Adding everything to the docstrings manually is too error-prone. If someone has a solution to this kind of problem, I'd be curious to hear it.

Pavlos Kosmetatos

unread,
May 29, 2017, 2:20:42 PM5/29/17
to Sebastian Böck, madmom-users
Okay, I see. What exactly is the num_bands variable? I see that with a higher num_bands I get better results, but I am not sure why that is and what downsides it could have.

On Mon, May 29, 2017 at 7:12 PM Sebastian Böck <sebastian...@gmail.com> wrote:
Unfortunately no. Right now the only way to see/discover which kwargs can be used is to examine the processor in question and follow the processing chain. Adding everything to the docstrings manually is too error-prone. If someone has a solution to this kind of problem, I'd be curious to hear it.

--
You received this message because you are subscribed to a topic in the Google Groups "madmom-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/madmom-users/WIEnxciq8ww/unsubscribe.
To unsubscribe from this group and all its topics, send an email to madmom-users...@googlegroups.com.
To post to this group, send email to madmom...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/madmom-users/6b4cdd92-d283-4c46-9aff-5a05addd8344%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sebastian Böck

unread,
May 29, 2017, 2:36:26 PM5/29/17
to madmom-users, sebastian...@gmail.com, pavl...@gmail.com
num_bands is passed to FilteredSpectrogramProcessor and determines the number of bands when the spectrogram is filtered before being scaled logarithmically. SuperFlux uses 24 bands with 200fps with its default setting. For the algorithmic details, please refer to the paper.
Reply all
Reply to author
Forward
0 new messages