Dear Freesound team,
I hope this message finds you well. I am a PHD student from University of Porto, conducting research on sound descriptors. Recently, we have been working with sounds and their corresponding descriptors obtained through the API of Freesound.
Additionally, we have used Essentia to extract the same sound descriptors for these sounds. However, we have noticed discrepancies between the values obtained from your API and those derived from Essentia.
We would like to understand if there are specific parameters or settings applied to extract these descriptors that could account for these differences. For instance, are there particular configurations for handling silence or other such parameters?
Below are examples showing just 'lowlevel.spectral_complexity', showing my code and the results I got, respectively:
code:
# Frame size and hop size frame_size = 2048 hop_size = 1024
# Use the frame generator
frames = [frame for frame in es.FrameGenerator(audio, frameSize=frame_size,
hopSize=hop_size, startFromZero=True)]
# Define windowing and spectrum computation
windowing = es.Windowing(type='hann')
spectrum = es.Spectrum()
spectral_complexity = es.SpectralComplexity()
# Initialize the sum of statistical values
total_min = 0
total_max = 0
total_mean = 0
total_var = 0 total_dmean = 0 total_dvar = 0 total_dmean2 = 0 total_dvar2 = 0
# Store the spectral complexity for each frame spectral_complexity_sequence = []
for i, frame in enumerate(frames):
windowed_frame = windowing(frame)
spec = spectrum(windowed_frame)
complexity = spectral_complexity(spec)
spectral_complexity_sequence.append(complexity)
# Convert to a NumPy array for statistical calculations spectral_complexity_sequence_np = np.array(spectral_complexity_sequence)
# Calculate statistical values
min_value = np.min(spectral_complexity_sequence_np)
max_value = np.max(spectral_complexity_sequence_np)
mean_value = np.mean(spectral_complexity_sequence_np)
var_value = np.var(spectral_complexity_sequence_np)
dmean_value = np.mean(np.diff(spectral_complexity_sequence_np)) if
len(spectral_complexity_sequence_np) >= 2 else None
dvar_value = np.var(np.diff(spectral_complexity_sequence_np)) if
len(spectral_complexity_sequence_np) >= 2 else None
dmean2_value = np.mean(np.diff(spectral_complexity_sequence_np) ** 2) if
len(spectral_complexity_sequence_np) >= 2 else None
dvar2_value = np.var(np.diff(spectral_complexity_sequence_np) ** 2) if
len(spectral_complexity_sequence_np) >= 2 else None
Results:
test 1 (Sound test 1 / id: 411460 ‐_ https://cdn.freesound.org/ previews/
411/411460_5121236-lq.mp3)
Our results / Original data for Freesound
min_value 0.0 / 0
max_value 19.0 / 18.9999993842421
mean_value 2.255813953488372 /2.28110641958725
var_value 23.54851270957274 /23.6215618071424
dmean_value 0.0 /0.462963069115548
dvar_value 1.6542056074766356 /1.47084283155535
dmean2_value 1.6542056074766356 /0.78139524787494
dvar2_value 38.73089352781902 /3.71501518255123
I look forward to your reply!
Best regards,
Lily
--
---
You received this message because you are subscribed to the Google Groups "Freesound API" group.
To unsubscribe from this group and stop receiving emails from it, send an email to freesound-ap...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/freesound-api/a8245ee1-f37a-4ea8-a446-2c79bba2b758n%40googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/freesound-api/CABDPa_yKzxb6qncLx%3DzwsVr3KZqiTeFDAtKwg7MwynyA3XeSdg%40mail.gmail.com.