mfcc got different result on exact same aduio segment in defferent evironment?

44 views

Skip to first unread message

Dante Zhou

unread,

Dec 2, 2024, 4:07:42 AM12/2/24

to librosa

Hi,
I am using latest librosa (0.10.2.post1) to do mfcc extraction from audio segments. Using the mfcc result to training model for audio classification.
I found the mfcc result is different on mac, linux, especially different on linux with different CPU.
Is it normal, or is there any setting I need to do? How could I train a model with mfcc, and use the model on other environment?
Below is the code I used to compare different output.

y, _ = librosa.load(audio_file, sr=sr)
duration = librosa.get_duration(y=y, sr=sr)

hashes = []
for start in np.arange(0, duration, interval):
end = min(start + interval, duration)
y_segment = y[int(start * sr):int(end * sr)]

mfcc = librosa.feature.mfcc(y=y_segment, sr=sr, hop_length=hop_length, n_mfcc=13, n_fft=200,n_mels=26)

mfcc_hash = hashlib.md5(mfcc.tobytes()).hexdigest()
hashes.append(mfcc_hash)
return hashes

Brian McFee

unread,

Dec 2, 2024, 11:03:48 AM12/2/24

to librosa

Yes, this is normal, and a consequence of floating point arithmetic. Even on the same platform and architecture, you may see different results in these computations simply due to how the underlying basic linear algebra package was compiled.

For a deep dive on this, I highly recommend reading through this classic write-up: https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

Checksums will fail, but the results ought to be very close (ie within 1e-10 of each other) - which should be plenty good enough for machine learning where a bit of statistical noise can be helpful anyway. In general, you should not use brittle checksums on floating point calculations. Instead, I recommend to use the numpy function "allclose" which allows you to test for equivalence within a specified numerical tolerance.

Reply all

Reply to author

Forward

0 new messages