Problems with long audiofile: width=9 cannot exceed data.shape[axis]=7

1,365 views
Skip to first unread message

Marcello Lussana

unread,
Apr 25, 2018, 6:32:28 AM4/25/18
to librosa
hi all,

I am trying out the librosa library and i'm using this tutorial:
https://github.com/ml4a/ml4a-guides/blob/master/notebooks/audio-tsne.ipynb

All works fine until I try to analyse a file longer than a few seconds.
that's the error I always get:
  File "/usr/local/lib/python3.5/dist-packages/librosa/feature/utils.py", line 110, in delta
    "cannot exceed data.shape[axis]={}".format(width, data.shape[axis]))
librosa.util.exceptions.ParameterError: when mode='interp', width=9 cannot exceed data.shape[axis]=7

value of data.shape[axis] printed in this error changes depending on the input file I want to analyse.
In attachment you find my script.

I understand the error, but I don't understand what I should change in my script in order to be able to analyse longer files.
any ideas?

thanks!
audio-tsne_forum.py

Brian McFee

unread,
Apr 26, 2018, 7:46:06 AM4/26/18
to librosa
It looks to me like the problem is that your audio signal is too short, not too long.

You're using feature.delta with a width-9 filter, but your input features only have 7 frames.

I'd suggest either shortening the filter, or padding out your signal to a minimum duration before processing.

Marcello Lussana

unread,
May 4, 2018, 9:39:17 PM5/4/18
to librosa
yes, i understand the issue, but then i don't understand the example I am following:
https://github.com/ml4a/ml4a-guides/blob/master/notebooks/audio-tsne.ipynb

Theoretically I am selecting the portion of the audio signal based on the onsets array, so the length should be the correct one.
I am not setting the width of the filter, should I do it based on the length of the frames? if yes, how can I do that?

Brian McFee

unread,
May 5, 2018, 9:09:31 AM5/5/18
to librosa
If you want to change the filter length (as opposed to padding your input signals), I'd recommend keeping it constant so that the feature extraction is standardized across all clips.

You could also use `mode='nearest'` (instead of 'interp', the default) to relax the width requirement.  The results will be a bit less accurate at the boundaries, but it will side-step the error.  See https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.savgol_filter.html#scipy.signal.savgol_filter for details about that.

Marcello Lussana

unread,
May 5, 2018, 11:36:07 PM5/5/18
to librosa
thanks, using nearest works well enough for now.

Anna

unread,
Jul 21, 2020, 8:32:44 AM7/21/20
to librosa
I am completely new to the topic and also have some very short recordings. I would like to change the filter but not sure where or when it happens. 

librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)

Where do I change the filter options?
Thanks!

Madhurananda Pahar

unread,
Jul 21, 2020, 9:35:41 AM7/21/20
to librosa
Which filter did you mean? mel-filters? You can change it in following way: 

S = np.abs(librosa.stft(audio_data))**2 
M = librosa.feature.melspectrogram(S = S, sr = sr, n_mels = 128) ## Here, I am specifying the number of mel-filters to be used
# Get MFCCs from log mel spectrogram
mfcc = librosa.feature.mfcc(S = librosa.core.amplitude_to_db(S), n_mfcc = 39, sr = sr) ## While specifying the number of mfcc-coefficients, it keeps 39 coefficients of discrete cosine transformation (DCT)

Also, just see the number of rows and columns are there for S, M and mfcc. You will see that there are 1025 rows for stft, which is (2048/2)+1. The column length will depend on your audio length. M will have 128 rows, each for every filter. Then, mfcc will have 39 rows each for every single coefficient. The column number will stay the same. 

Hope this helps.  
Reply all
Reply to author
Forward
0 new messages