How can I extract log-mel energies features with librosa?

mkj...@gmail.com

unread,

Jul 9, 2018, 10:20:05 PM7/9/18

to librosa

As I know, log-mel energy is one of the most preferable features in sound recognition tasks.

So I wonder how I can extract log-mel energies from an audio file with librosa.

In my opinion, I can use the function as below with changing the parameter "power" as "1" . This maybe make just "mel energies". Is that right?

librosa.feature.melspectrogram(y=None, sr=22050, S=None, n_fft=2048, hop_length=512, power=2.0, **kwargs)

power:float > 0 [scalar] (Exponent for the magnitude melspectrogram. e.g., 1 for energy, 2 for power, etc.)

Then how can I transform it to log-mel energies? There is no function like "energy_to_db( )"

If the procedures I described above are totally wrong, please let me know the right ways.

Brian McFee

unread,

Jul 11, 2018, 6:49:52 PM7/11/18

to librosa

There are several functions for this, depending on which value you use for power: amplitude_to_db (power=1) and power_to_db(power=2) are probably what you want.

More generally, see this section of the documentation: https://librosa.github.io/librosa/core.html#magnitude-scaling

mkj...@gmail.com

unread,

Jul 11, 2018, 9:09:21 PM7/11/18

to librosa

Then I can think of two ways as follows.

1. y, sr = librosa.load('example.wav')

energy = librosa.feature.melspectrogram (y=y, sr=sr, power=1)

result1 = librosa.core.amplitude_to_db (energy)

2. y, sr = librosa.load('example.wav')

power = librosa.feature.melspectrogram (y=y, sr=sr, power=2)

result2 = librosa.core.power_to_db (power)

In my opinion, 'result1' is log-mel energies and 'result2' is log-mel power spectrogram.

Is that right? Or both are same??

I'm really confused about these terminologies.

2018년 7월 12일 목요일 오전 7시 49분 52초 UTC+9, Brian McFee 님의 말:

Brian McFee

unread,

Aug 2, 2018, 1:00:43 PM8/2/18

to librosa

Yes, both of those are correct, and should give comparable results (down to floating point round-off, anyway).

Again, I apologize for any confusing terminology. The key thing here is that both versions convert "to_db", that is, the outputs are in decibels. The fact that one went through energy (power=1) and the other through power (power=2) doesn't matter so much once you're in dB.

Reply all

Reply to author

Forward