How can I extract log-mel energies features with librosa?

1,941 views
Skip to first unread message

mkj...@gmail.com

unread,
Jul 9, 2018, 10:20:05 PM7/9/18
to librosa

As I know, log-mel energy is one of the most preferable features in  sound recognition tasks.
So I wonder how I can extract log-mel energies from an audio file with librosa.

In my opinion, I can use the function as below with changing the parameter "power" as "1" . This maybe make just "mel energies".  Is that right?


    librosa.feature.melspectrogram(y=Nonesr=22050S=Nonen_fft=2048hop_length=512power=2.0**kwargs)

    power:float > 0 [scalar]  (Exponent for the magnitude melspectrogram. e.g., 1 for energy, 2 for power, etc.)



Then how can I transform it to log-mel energies? There is no function like "energy_to_db( )"


If the procedures I described above are totally wrong, please let me know the right ways.

Brian McFee

unread,
Jul 11, 2018, 6:49:52 PM7/11/18
to librosa
There are several functions for this, depending on which value you use for power: amplitude_to_db (power=1) and power_to_db(power=2) are probably what you want.

More generally, see this section of the documentation: https://librosa.github.io/librosa/core.html#magnitude-scaling

mkj...@gmail.com

unread,
Jul 11, 2018, 9:09:21 PM7/11/18
to librosa
Then I can think of two ways as follows.

1. y, sr = librosa.load('example.wav')

   energy = librosa.feature.melspectrogram (y=y, sr=sr, power=1)

   result1 = librosa.core.amplitude_to_db (energy)

2. y, sr = librosa.load('example.wav')

   power = librosa.feature.melspectrogram (y=y, sr=sr, power=2)

   result2 = librosa.core.power_to_db (power)
   

In my opinion, 'result1' is log-mel energies and 'result2' is log-mel power spectrogram.
Is that right? Or both are same??

I'm really confused about these terminologies.

2018년 7월 12일 목요일 오전 7시 49분 52초 UTC+9, Brian McFee 님의 말:

Brian McFee

unread,
Aug 2, 2018, 1:00:43 PM8/2/18
to librosa
Yes, both of those are correct, and should give comparable results (down to floating point round-off, anyway).

Again, I apologize for any confusing terminology.  The key thing here is that both versions convert "to_db", that is, the outputs are in decibels.  The fact that one went through energy (power=1) and the other through power (power=2) doesn't matter so much once you're in dB.
Reply all
Reply to author
Forward
0 new messages