Extract logged mel filterbank energies from libROSA

1,790 views
Skip to first unread message

Carlton Banks

unread,
Apr 24, 2017, 9:35:37 AM4/24/17
to librosa
I am currently trying to extract logged mel filter banks energies from a framed audio signal. As with normal speech speech recognition should the frames be overlapping. 

Which is libROSA can be done using: 

    librosa.util.frame(y, frame_length=2048, hop_length=512)



But how do i extract the logged mel filter banks energies from a framed audio signal.  There seem to be a way compute the filters needed 

    librosa.filters.mel(sr, n_fft, n_mels=128, fmin=0.0, fmax=None, htk=False)



But how do i apply the filters onto the framed audio signal?

Justin Salamon

unread,
Apr 24, 2017, 9:51:03 AM4/24/17
to Carlton Banks, librosa
Librosa has a built-in melspectrogram function that will take you directly from the audio signal to the mel spectrogram. You'd just have to log scale it, which you can do with logamplitude.

Cheers,

Justin

--
You received this message because you are subscribed to the Google Groups "librosa" group.
To unsubscribe from this group and stop receiving emails from it, send an email to librosa+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/librosa/72c1d8ab-1d1a-4704-94c3-d65f3e88e69c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Justin Salamon, PhD
Senior Research Scientist
Music and Audio Research Laboratory (MARL)
& Center for Urban Science and Progress (CUSP)
New York University, New York, NY

Carlton Banks

unread,
Apr 24, 2017, 9:53:58 AM4/24/17
to Justin Salamon, librosa
What if I want to specify the number of filter bank i want to use?
And I guess I have to frame my audio beforehand, this is not done by  melspectrogram

Justin Salamon

unread,
Apr 24, 2017, 10:00:38 AM4/24/17
to Carlton Banks, librosa
To specify the number of bands you can give n_mels=X to melspectrogram() (it'll get passed down to melbands()). The function does frame the signal, and you can control the framing parameters via the n_fft=2048, hop_length=512 arguments to melspectrogram().

Cheers

Carlton Banks

unread,
Apr 24, 2017, 10:50:28 AM4/24/17
to librosa, nof...@gmail.com
So I managed to make this: 



from this code:

import librosa

import librosa.display

import numpy as np

import matplotlib.pyplot as plt

audio_path
="/Users/carl/Desktop/SA2.WAV"

[y,sr]  = librosa.core.load(audio_path,sr=16000)

print y.shape

print sr

specto
= librosa.feature.melspectrogram(y, sr=sr, n_fft=400, hop_length=160, n_mels=40)

log_specto
= librosa.core.logamplitude(specto)

plt
.figure(figsize=(12,4))

librosa
.display.specshow(log_specto,sr=sr,x_axis='time', y_axis='mel')

plt
.title('mel power spectrogram')

plt
.colorbar(format='%+02.0f dB')

plt
.tight_layout()

plt
.show()

print specto.shape

print log_specto.shape

The time axis seem to be incorrect.. 

this is the audio file https://clyp.it/b4aozg0d

Its duration is only 2 seconds, but this shows it to be 6?


Den mandag den 24. april 2017 kl. 16.00.38 UTC+2 skrev Justin Salamon:
To specify the number of bands you can give n_mels=X to melspectrogram() (it'll get passed down to melbands()). The function does frame the signal, and you can control the framing parameters via the n_fft=2048, hop_length=512 arguments to melspectrogram().

Cheers
On Mon, Apr 24, 2017 at 9:53 AM, Carlton Banks <nof...@gmail.com> wrote:
What if I want to specify the number of filter bank i want to use?
And I guess I have to frame my audio beforehand, this is not done by  melspectrogram

Den 24. apr. 2017 kl. 15.50 skrev Justin Salamon <justin....@nyu.edu>:

Librosa has a built-in melspectrogram function that will take you directly from the audio signal to the mel spectrogram. You'd just have to log scale it, which you can do with logamplitude.

Cheers,

Justin
On Mon, Apr 24, 2017 at 9:35 AM, Carlton Banks <nof...@gmail.com> wrote:
I am currently trying to extract logged mel filter banks energies from a framed audio signal. As with normal speech speech recognition should the frames be overlapping. 

Which is libROSA can be done using: 

    librosa.util.frame(y, frame_length=2048, hop_length=512)



But how do i extract the logged mel filter banks energies from a framed audio signal.  There seem to be a way compute the filters needed 

    librosa.filters.mel(sr, n_fft, n_mels=128, fmin=0.0, fmax=None, htk=False)



But how do i apply the filters onto the framed audio signal?

--
You received this message because you are subscribed to the Google Groups "librosa" group.
To unsubscribe from this group and stop receiving emails from it, send an email to librosa+u...@googlegroups.com.


--
Justin Salamon, PhD
Senior Research Scientist
Music and Audio Research Laboratory (MARL)
& Center for Urban Science and Progress (CUSP)
New York University, New York, NY

Brian McFee

unread,
Apr 24, 2017, 11:01:30 AM4/24/17
to librosa, nof...@gmail.com
Please refer to the documentation for specshow: https://librosa.github.io/librosa/generated/librosa.display.specshow.html#librosa.display.specshow

It looks like you forgot to set the hop length.

Carlton Banks

unread,
Apr 24, 2017, 11:10:50 AM4/24/17
to Brian McFee, librosa
Thanks that fixed it. 
One last thing. Is it possible to using specshow to display the x-axis in frames instead of time. 
You received this message because you are subscribed to a topic in the Google Groups "librosa" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/librosa/zAD9-WOm-OE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to librosa+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/librosa/2ccf5663-7393-44cf-ae14-58570054f716%40googlegroups.com.

Carlton Banks

unread,
Apr 24, 2017, 11:29:55 AM4/24/17
to Brian McFee, librosa
Ahh.. Sorry found it Thanks for the help :)

Reza Habibi

unread,
May 31, 2019, 12:38:40 PM5/31/19
to librosa
Hi,
By any chance do you have any idea if I don't have the Wav file and just have spectrogram (frequency, time series), how I can get MFCC?
In your mentioned code there is a WAV input file. but I'm looking for a spectrogram input.

Thanks.
R


On Monday, April 24, 2017 at 10:50:28 AM UTC-4, Carlton Banks wrote:

Vincent Lostanlen

unread,
May 31, 2019, 3:19:14 PM5/31/19
to Reza Habibi, librosa
Dear Reza Habibi,

The function librosa.feature.mfcc can take a spectrogram representation as input.
Use the keyword argument "S" for spectrogram, and leave the keyword argument "y" for the spectrogram unspecified.


Sincerely,
Vincent.

Reply all
Reply to author
Forward
0 new messages