Questions about sound descriptors

Lily

unread,

Jun 26, 2024, 8:28:55 AM6/26/24

to Freesound API

Dear Freesound team,

I hope this message finds you well. I am a PHD student from University of Porto, conducting research on sound descriptors. Recently, we have been working with sounds and their corresponding descriptors obtained through the API of Freesound.

Additionally, we have used Essentia to extract the same sound descriptors for these sounds. However, we have noticed discrepancies between the values obtained from your API and those derived from Essentia.

We would like to understand if there are specific parameters or settings applied to extract these descriptors that could account for these differences. For instance, are there particular configurations for handling silence or other such parameters?

Below are examples showing just 'lowlevel.spectral_complexity', showing my code and the results I got, respectively:

code:

# Frame size and hop size frame_size = 2048 hop_size = 1024

# Use the frame generator
frames = [frame for frame in es.FrameGenerator(audio, frameSize=frame_size, hopSize=hop_size, startFromZero=True)]

# Define windowing and spectrum computation windowing = es.Windowing(type='hann') spectrum = es.Spectrum()
spectral_complexity = es.SpectralComplexity()

# Initialize the sum of statistical values total_min = 0
total_max = 0
total_mean = 0

total_var = 0 total_dmean = 0 total_dvar = 0 total_dmean2 = 0 total_dvar2 = 0

# Store the spectral complexity for each frame spectral_complexity_sequence = []

for i, frame in enumerate(frames):
windowed_frame = windowing(frame)
spec = spectrum(windowed_frame)
complexity = spectral_complexity(spec) spectral_complexity_sequence.append(complexity)

# Convert to a NumPy array for statistical calculations spectral_complexity_sequence_np = np.array(spectral_complexity_sequence)

# Calculate statistical values
min_value = np.min(spectral_complexity_sequence_np)
max_value = np.max(spectral_complexity_sequence_np)
mean_value = np.mean(spectral_complexity_sequence_np)
var_value = np.var(spectral_complexity_sequence_np)
dmean_value = np.mean(np.diff(spectral_complexity_sequence_np)) if

len(spectral_complexity_sequence_np) >= 2 else None
dvar_value = np.var(np.diff(spectral_complexity_sequence_np)) if

len(spectral_complexity_sequence_np) >= 2 else None
dmean2_value = np.mean(np.diff(spectral_complexity_sequence_np) ** 2) if

len(spectral_complexity_sequence_np) >= 2 else None
dvar2_value = np.var(np.diff(spectral_complexity_sequence_np) ** 2) if

len(spectral_complexity_sequence_np) >= 2 else None

Results:
test 1 (Sound test 1 / id: 411460 ‐_ https://cdn.freesound.org/ previews/ 411/411460_5121236-lq.mp3)

Our results / Original data for Freesound
min_value 0.0 / 0
max_value 19.0 / 18.9999993842421
mean_value 2.255813953488372 /2.28110641958725
var_value 23.54851270957274 /23.6215618071424
dmean_value 0.0 /0.462963069115548
dvar_value 1.6542056074766356 /1.47084283155535
dmean2_value 1.6542056074766356 /0.78139524787494
dvar2_value 38.73089352781902 /3.71501518255123

I look forward to your reply!

Best regards,

Lily

Frederic Font Corbera

unread,

Jul 8, 2024, 3:59:33 AM7/8/24

to freeso...@googlegroups.com

Hi Lily,

Good to hear you carrying out this research, it definitely sounds interesting.

The descriptors you obtain through the API are calculated using an old version of essentia. In newer versions of essentia we made the FreesoundExtractor algorithm (which I guess is the one you're using) which tends to deliver similar results to what you get through the API, but not the same. The essentia version we use in Freesound will be updated at some point to use the newer version of essentia, but we have not done that yet.

If you are interested in being able to reproduce almost exact results as those obtained from the API, we have a docker image that compiles an old version of essentia and can do it. Let me know if you are interested in that.

Cheers,

frederic

--

Frederic Font - ffont.github.io

Phonos - phonos.upf.edu

Music Technology Group, UPF - mtg.upf.edu

Freesound - freesound.org

--

---
You received this message because you are subscribed to the Google Groups "Freesound API" group.
To unsubscribe from this group and stop receiving emails from it, send an email to freesound-ap...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/freesound-api/a8245ee1-f37a-4ea8-a446-2c79bba2b758n%40googlegroups.com.

Lily C

unread,

Jul 8, 2024, 9:53:26 AM7/8/24

to freeso...@googlegroups.com

Hi!

Good afternoon!
Thank you for your reply.
Yes, we are interested in reproducing results that are nearly identical to those obtained from the API. If you could provide a Docker image that compiles an older version of Essentia, that would be fantastic!

Best regards,

Lily

'Frederic Font Corbera' via Freesound API <freeso...@googlegroups.com> 于2024年7月8日周一 08:59写道：

To view this discussion on the web, visit https://groups.google.com/d/msgid/freesound-api/CABDPa_yKzxb6qncLx%3DzwsVr3KZqiTeFDAtKwg7MwynyA3XeSdg%40mail.gmail.com.

Reply all

Reply to author

Forward