Onset Env, Beat tracking with Dynamic tempo

14 views
Skip to first unread message

Lewis G

unread,
Nov 11, 2025, 3:42:23 PMNov 11
to librosa
Hi,

I've been having some challenges trying to detect beats at the correct times using the beat tracking that works consistently across different eletrictronic, genres.

I found that by reducing the hop count to 256 that helps considerably and then trying to limit the onset env using an fmax between 2000-5000, I've been using SR of none which most tracks I have loaded are at 44010.

Currently I'm using the tempo estimate which is in a reasonable region to then create adaptive settings for the delta and wait for peak picking.

I then use another custom function to find the optimal bpm trying to correlate the onsets around the rought tempo with th highest hit rate and dropping beats at those points. 

This is suprisingly accurate at hitting the correct points in the track however it falls over when there is any change at all to tempo. 

Begining the think that there's probably something else I'm not doing correctly with the librosa and I probably dont need all of the customer logic.

Also I have tried to use a SR of 22050 but when I do that the beat tracking is even less accurate.

I'm trying to create a tool that will create structural boundaries and also export beat synced features for genre and mix analysis.

Heres the onset_env creation and beat tracking I attempt.

               # Calculate RAW onset envelope specifically for beat tracking
                print("\nCalculating RAW onset for beat tracking...")
                onset_env_padded_raw = librosa.onset.onset_strength(
                    y=y_padded,
                    sr=sr,
                    hop_length=256,
                    aggregate=np.median
                )

         

                # Beat detection using RAW PADDED onset
                tempo_array, initial_beat_frames = librosa.beat.beat_track(
                    onset_envelope=onset_env_padded_raw,  # ← RAW onset for beat tracking
                    sr=sr,
                    hop_length=256,
                    start_bpm=round(rough_tempo),         # ← ROUNDED for cleaner math
                    tightness=config.BEAT_TIGHTNESS,
                    trim=False,
                    units='frames'
                )

Is there anything obvious I'm missing here?



Thanks

Lewis

Brian McFee

unread,
Nov 11, 2025, 3:49:58 PMNov 11
to librosa
If you have significant tempo changes, the static beat tracker is always going to have problems like you describe.

You might want to take a look at the time-varying extension described in this example: https://librosa.org/doc/latest/auto_examples/plot_dynamic_beat.html#sphx-glr-auto-examples-plot-dynamic-beat-py - this lets you supply a time-varying tempo curve to the beat tracker.  The algorithm is essentially the same, looking for strong onsets within the expected region given by the previous detection (recursively) and the tempo estimate, but now the amount of search doesn't need to be fixed over the entire track.  If you can get a reasonable local tempo estimate, it ought to resolve this problem for you.

One unrelated caveat: if you're allowing for uncontrolled sampling rates (ie sr is not always the same across tracks), then you might want to adapt your hop length to be calculated from a fixed duration in seconds, rather than pinned to a constant number of samples.

Lewis G

unread,
Nov 12, 2025, 9:30:05 AMNov 12
to Brian McFee, librosa
Thanks that's helpful.

I realised that I reduced the SR by half but forgot to reduce the hop count since doing that it seems to be more reliable for steady tracks with the custom onset phase detection. Even with tracks that have a steady BPM the beat_track either locks on to transients or starts drifting through a breakdown and never recovers. I'll have another go with the varying tempo though I need to find a way to determine when that should be used over the default...

This is interesting though because I've also been trying to integrate your example of Laplacian clustering, when I try to pass the custom beats which do actually align musically I get much less clusters but aligned than what I do using the best tracker, there's a deficit of between 20-60 beats depending on the track. With the default beat tracking I get better quality clusters and sections though they need some refinements to get them to snap to aligned beats and bars etc which is probably expected.

There is another challenge I've been looking at if I can share that here, the pipeline in running currently uses N-1 worker processes, a rough breakdown

Load audio

Extract metadata 

Librosa.feature.Tempo - pass into beat_track (used for anchor function and curiosity)

Find optimal interval phase from onset rounding initial tempo

Extract RMSe frames (search for track peaks and manual backtrack with N% look back for anchors)

Validate anchor spacing based on interval

Extract 13 features including 12 chorma bins, 13mfcc, basse, Spec flux, spec roll-off,tonnetz etc etc

Beat sync features

Dispose of frames

Compute novelty from feats

Laplacian clustering (passing file path to almost exact copy of your example rather than y from main script )

Various other data ops to prepare to save to file.

The issue I'm having is that each process is consuming up to 4-5 GB at worst but usually around 1-1.5gb.

Im monitoring usage in python but it only ever appears that it's using 500mb or so and clearing down between functions yet task manager reports 3-4+ times more, I clear the librosa cache between tracks too is there anything else I should consider?

Thanks

Lewis 



--
You received this message because you are subscribed to a topic in the Google Groups "librosa" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/librosa/kLsh5hw9LDw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to librosa+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/librosa/14214833-f49b-448c-8515-15fe604338dan%40googlegroups.com.

Brian McFee

unread,
Nov 12, 2025, 9:39:04 AMNov 12
to librosa
On Wednesday, November 12, 2025 at 9:30:05 AM UTC-5 Lewis G wrote:
Thanks that's helpful.

I realised that I reduced the SR by half but forgot to reduce the hop count since doing that it seems to be more reliable for steady tracks with the custom onset phase detection. Even with tracks that have a steady BPM the beat_track either locks on to transients or starts drifting through a breakdown and never recovers. I'll have another go with the varying tempo though I need to find a way to determine when that should be used over the default...


Right - the beat tracker is trying to select peaks of the onset envelope, under the assumption that stronger onset ≈ more likely to be rhythmically relevant.  This is definitely not always true, and in fact can be very style dependent.  For example, you might have a jazz recording with hi-hats spamming the onset envelope, and the tracker will prefer to follow those instead of a more stable pulse laid down by a bass with relatively softer onsets.  You can sometimes hack around this by limiting the frequency range of the onset calculation, or tweaking the aggregation function (mean instead of median, etc).

 
This is interesting though because I've also been trying to integrate your example of Laplacian clustering, when I try to pass the custom beats which do actually align musically I get much less clusters but aligned than what I do using the best tracker, there's a deficit of between 20-60 beats depending on the track. With the default beat tracking I get better quality clusters and sections though they need some refinements to get them to snap to aligned beats and bars etc which is probably expected.

That makes sense.  I've definitely seen it happen where the beat tracker goes way off the rails, but results in better segmentation.  (This happened to me exactly in MIREX 2013 :))  Ultimately the segmenter just needs a coherent (if non-uniform) downsampling in time - whether that downsampling is beat aligned or happens on subdivisions (eg double tempo) is less important.
 

There is another challenge I've been looking at if I can share that here, the pipeline in running currently uses N-1 worker processes, a rough breakdown

Load audio

Extract metadata 

Librosa.feature.Tempo - pass into beat_track (used for anchor function and curiosity)

Find optimal interval phase from onset rounding initial tempo

Extract RMSe frames (search for track peaks and manual backtrack with N% look back for anchors)

Validate anchor spacing based on interval

Extract 13 features including 12 chorma bins, 13mfcc, basse, Spec flux, spec roll-off,tonnetz etc etc

Beat sync features

Dispose of frames

Compute novelty from feats

Laplacian clustering (passing file path to almost exact copy of your example rather than y from main script )

Various other data ops to prepare to save to file.

The issue I'm having is that each process is consuming up to 4-5 GB at worst but usually around 1-1.5gb.

Im monitoring usage in python but it only ever appears that it's using 500mb or so and clearing down between functions yet task manager reports 3-4+ times more, I clear the librosa cache between tracks too is there anything else I should consider?

You might want to look into setting the cache level appropriate to your needs: https://librosa.org/doc/latest/cache.html

It might also help to force a garbage collection in python at certain points.  

Lewis G

unread,
Nov 18, 2025, 2:18:42 PMNov 18
to librosa
OK thanks, I'll try with some of the suggestions and see if that improves on the current output.
Reply all
Reply to author
Forward
0 new messages