Oh my copypasta mistake, I did use oenv in tempo originally, as well as in beat_track. I also tried using plp like such:oenv = librosa.onset.onset_strength(y=y, sr=sr, hop_length=hop_length)plp = librosa.beat.plp(sr=sr, onset_envelope=oenv, hop_length=hop_length, win_length=win_length, tempo_min=mintempo)foundbpm = librosa.feature.tempo(sr=sr, onset_envelope=plp, start_bpm=mintempo, max_tempo=mintempo+100)as well as skipping onset_ strength and providing y to plp directly.
To illustrate, I have used librosa.clicks to compare the detected features with the source signal.Oh it seems I can't post screenshots here of the resulting plots, so I will try in words:
When processing the rolling buffer a few times at steady intervals, (the last 10 seconds every 1 second) I notice that the length of either oenv, plp or beats varies (I assume this is normal as the length is just the amount of features found and the values are their positions)
BUT when I use them in beat_track and feed the beats to librosa.clicks, I never get a click track back with anywhere near 10 seconds, and each time the length and the number of clicks is different but never anywhere near the number of visible kickdrum peaks in the signal.
I also wrote about this problem earlier, BPM is considered incorrect, and substitutes some template values such as 123.046875.
Found the image upload button :)
So in my debugging, I can see that the length of y remains fixed, I am 100% sure that CHOP channels in TouchDesigner don't go suddenly changing their number of samples when taking them as a numpy array in python.
For the capture above, the signal (y) is exactly the same as the input signal.
First of all, thank you for taking your time to help out a stranger with this problem.
Python in TouchDesigner is an integrated environment. I am just saving out the same numpy array that I pass to onset_strength.
Don't get blinded by the numerical details of the example, as I stated before, I have been experimenting a lot, this is just a random screenshot of one of the 100's of tries.
I guess in this one I might have tried to downsample the input to a factor 20. In any case -and I can't repeat this enough- it's NOT a single 'special' situation dependent on a specific input state.
Meanwhile I have replaced the librosa function in my code,
I can get good bpm's from the same input when I apply a low pass filter, then capture the instant signal power envelope and downsample to 400hz, and then use numpy's correlate and scipy's find_peaks
Then when I map the resulting peak position values onto an array with the same length as the input signal, I can verify the pulses line up with the original waveform.
Sadly the accuracy of my amateur solution suffers from the extreme downsampling (and without it numpy.correlate becomes too slow for realtime) and it only works on kickdrum-focussed tracks where there's not too much movement in the bass and sub frequencies from a funky bassline or such.
At this point I'm ready to give up on trying to use librosa. It's a very cool idea and the bpm function seemed very easy to use in theory (insert signal, insert sample rate, get correct bpm).
All I can think of to continue troubleshooting is when I go and export buffers to wav files and then adapt the code to run outside of TD and then learn to use matplotlib and more python to write a testing suite there... that's just an insane amount of work.
Ok I see where I was confused in the terminology. So the number of samples in oenv is equal to the number of windows analyzed by the mel spectogram function, and the response for each hop is the result of retreiving this feature value for each window.
I guess if I change the hop length , it would make sense to dynamically find an optimal value for win_length as well? Perhaps depending on the target bpm range.
I can see that win_length defaults to n_fft size, which does not seem do default to anything, yet the function works without providing a value for it, so I'm not sure from which value to start experimenting.
But I still don't get how the oenv relates to the stft.
My understanding of the science side of signal analysis is very rudimentary as I am approaching this from a creative coder/musicin perspective, so please bear with me:As I understand it so far:
> The tempogram is basically like a spectogram we have in DAW's and media players, with a vertical resolution equal to the number of fft bins, and a horizontal resolution of 1 sample per processed window.
> But instead of getting high values in the fft bins for excitement in the corresponding pitch frequencies, it uses much larger signal windows to get results in the rhythmic frequencies range.
> The onset envelope is not the same as the tempogram, it does not have a vertical resolution.
Question 1: then how did you plot a tempogram from the onset envelope like that?
Question 2: When you say 'aggregate that over time', that sounds to me like collapsing the vertical dimension by summing the values of each column, which would return again an onset envelope (as each horizontal row in the tempogram is basically an onset envelope of a single frequency bin) - but instead you get this 'tempo curve', but I can't find anything about that in the docs.