convert pydub audiosegment to ndarray for librosa

Rui Guo

unread,

Jun 12, 2018, 5:54:47 AM6/12/18

to librosa

I wonder how to convert pydub audiosegment to librosa float32 ndarray. I want to change pitch and tempo using audiosegment, but I think librosa maybe better for that purpose.

Brian McFee

unread,

Jun 12, 2018, 9:20:59 AM6/12/18

to librosa

I haven't used pydub before, but skimming the docs, it looks like raw_data (to get a bytestring) plus np.frombuffer (to convert to ndarray) should do what you want.

Rui Guo

unread,

Jun 12, 2018, 10:30:10 PM6/12/18

to librosa

pydub raw data is int32, and if I convert raw data to ndarray by

new_audio = np.frombuffer(raw_audio_data,dtype = 'int32')

the length of the ndarray will match the length of ndarray if I read the audio directly by librosa. However, new_audio is int32 now, and it is outside of the range -1..1 like float32 librosa use. How to make the value of new_audio in the range librosa accepts?

Rui Guo

unread,

Jun 12, 2018, 11:29:22 PM6/12/18

to librosa

I also try

audio_back = librosa.util.buf_to_float(raw_audio_data,n_bytes=4,
                                       dtype=np.float32)

but the audio_back has noise, and the original sound in audiosegment doesn't have.

Rui Guo

unread,

Jun 12, 2018, 11:42:13 PM6/12/18

to librosa

For experiment, I test:

y, sr = librosa.load(librosa.util.example_audio_file())
ipd.Audio(y, rate=sr)
new_result = librosa.util.buf_to_float(y.tobytes(),n_bytes=4,
                                       dtype=np.float32)
ipd.Audio(new_result, rate=sr)

However, new_result has noise, and y doesn't if I use ipd.Audio to check. I don't know if I use librosa.util.buf_to_float in the right way.

Brian McFee

unread,

Jun 13, 2018, 10:02:51 AM6/13/18

to librosa

librosa doesn't necessarily assume +-1 bounds, but it does assume that audio signals are float-valued. If your int-typed ndarray is correct, then you can always just cast it to float by doing `new_audio.astype(np.float32)` and then scale it however you see fit by dividing out MAX_INT.

Rui Guo

unread,

Jun 13, 2018, 9:55:32 PM6/13/18

to librosa

I try that but the new sound has lots of noise, like the similar result I get below:

y, sr = librosa.load(librosa.util.example_audio_file())
ipd.Audio(y, rate=sr)
new_result = librosa.util.buf_to_float(y.tobytes(),n_bytes=4,
                                       dtype=np.float32)
ipd.Audio(new_result, rate=sr)

Rui Guo

unread,

Jun 14, 2018, 2:48:24 AM6/14/18

to librosa

By trial and error, I write two functions for the conversion of pydub audiosegment and ndarray for librosa. Those can work for those using pydub and librosa together.

def ndarray_to_audiosegment(y,frame_rate):
    
    if(len(y.shape) == 2):
        new_array = np.zeros((y.shape[1]*2),dtype=y.dtype)
        new_array[::2] = y[0]
        new_array[1::2] = y[1]
    else:
        new_array = y
        
    audio_segment = pydub.AudioSegment(
    new_array.tobytes(), 
    frame_rate=frame_rate,
    sample_width=new_array.dtype.itemsize, 
    channels = len(y.shape)
)
    return audio_segment

def audiosegment_to_ndarray(audiosegment):
    samples = audiosegment.get_array_of_samples()
    samples_float = librosa.util.buf_to_float(samples,n_bytes=2,
                                      dtype=np.float32)
    if audiosegment.channels==2:
        sample_left= np.copy(samples_float[::2])
        sample_right= np.copy(samples_float[1::2])
        sample_all = np.array([sample_left,sample_right])
    else:
        sample_all = samples_float
        
        
    return [sample_all,audiosegment.frame_rate]

Brian McFee

unread,

Jun 14, 2018, 10:48:07 AM6/14/18

to librosa

Ah, it looks like the problem was that pydub audio segments are using interleaved left and right channels.

You can probably get the same effect (more efficiently) in audiosegment_to_ndarray by reshaping the array, rather than copying the data and reconstructing it. But anyway, I'm glad you got it working!

Abhimanyu Singhal

unread,

Jul 1, 2020, 6:11:31 AM7/1/20

to librosa

Hey, could you please post a sample for that? How do I reshape the array? I am trying to convert AudioSegment to Librosa. Thanks

Reply all

Reply to author

Forward