Making a spectrogram for bird songs

641 views
Skip to first unread message

Clark Case

unread,
May 20, 2023, 2:47:58 PM5/20/23
to librosa
Hi All - I'm working on an app for detecting bird songs.  The detection part itself is straightforward as there's another project that actually provides the analysis engine. I'm providing recording, storage, and UI.

Anyhow, I'm displaying spectrograms for the birdsongs with the create_spectrogram function found in here:
This function was the result of Googling and ChatGPTing pretty much entirely.

For making small images this works fine, but larger ones are pretty low resolution. This may due at least in part to the input audio having 16KHz sampling (blame my cheap security camera that's providing the audio).

I've tried torturing ChatGPT into telling me how to make it higher res, and it has some some suggestions but none seemed to make a huge difference.

Suggestions for improvement would be greatly appreciated. I've attached a sample input file and resulting PNG.
samplespectrogram.png
sampleaudio.mp3

Vincent Lostanlen

unread,
May 20, 2023, 3:05:37 PM5/20/23
to Clark Case, librosa
Hello,

If by “resolution” you mean number of pixels, try reducing hop length or increasing NFFT

If by “resolution” you mean Heisenberg time—frequency uncertainty, I have some bad news …

Some workarounds are possible (namely, time—frequency reassignment and high-resolution techniques such as ESPRIT) but they come with caveats. Proceed with caution.

Also, i suggest you get in touch with BirdNET maintainers (ccb-b...@cornell.edu). Remember that BirdNET is CC-BY-NC-SA 4.0, and they maintain a “showroom”of noncommercial uses of BirdNET-Analyzer.



Sincerely,

Vincent.




--
You received this message because you are subscribed to the Google Groups "librosa" group.
To unsubscribe from this group and stop receiving emails from it, send an email to librosa+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/librosa/b39cfb73-3f26-432a-9291-620d8011959bn%40googlegroups.com.
<samplespectrogram.png><sampleaudio.mp3>

Clark Case

unread,
May 20, 2023, 5:30:32 PM5/20/23
to librosa
Thanks for the info - I don't think I'm trying to violate any laws of quantum mechanics, just trying to make a prettier picture with a reasonable amount of effort and computational resources :) I'm hoping for the same number of pixels in the image, but better time and frequency resolution. I'll mess around the parameters you mentioned.

Also thanks for the heads up and the email address for the BirdNET folks at Cornell. I'll drop them a line.

Clark

Vincent Lostanlen

unread,
May 20, 2023, 5:44:34 PM5/20/23
to Clark Case, librosa
Hello,

If you want both better time and frequency resolution, then you _are_ effectively trying to violate certain laws of signal processing, which happen to be commensurate with laws of quantum mechanics. In my previous email i shared a link to a PDF of a textbook (Mallat’s “Wavelet Tour of Signal Processing”) which explains the theory behind it. If you’re unfamiliar with signal processing, i recommend this other textbook as introduction: https://www.fourierandwavelets.org/FSP_v1.1_2014.pdf
Time-frequency uncertainty is addressed in chapter 7. Also see historical remarks on page 667


Reducing hop length and increasing NFFT won’t make the image look “better”, it will simply raise the width and height. It’s now clear that this is not what you asked about.


I hope this helps,

Vincent.


Brian McFee

unread,
May 22, 2023, 11:28:06 AM5/22/23
to librosa
There is a bit of subtlety here, in that the the number of frames per second (ie, horizontal pixel resolution, time resolution) is not necessarily tied to the frequency resolution.  So you can definitely drop your hop length while keeping the frame length the same.  It won't really add any more information (Heisenberg still applies), but it might make for a smoother looking image.  This will produce a larger spectrogram array though, so you'll pay for it in memory consumption.

Another option is to change the shading parameter in `specshow`.  The default is nearest-neighbor, which results in the chunky, pixelated display.  If you change this to shading="gouraud", it will change the rendering to look a bit "nicer".  This is more computationally expensive at display-time, but the analysis parameters do not have to change.  The following image differs only in the shading= parameter to specshow using your example clip:
example-shading.png

Clark Case

unread,
May 23, 2023, 10:43:28 AM5/23/23
to librosa
That does make a prettier picture - thanks! I'll give it a try.
Reply all
Reply to author
Forward
0 new messages