Hello,
I am trying to get information from audio files that are basically impulses, or bursts. I am trying to figure out the best way to add MFCCs, Delta and Delta-Delta.
for 40 features each or n_mfcc=40, I tried using this approach:
def extract_features(file_name):
try:
durationSeconds = 1
audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast')
trimmed = librosa.util.fix_length(audio, size=int(sample_rate * durationSeconds))
mfccs = librosa.feature.mfcc(y=trimmed, sr=sample_rate, n_mfcc=nmfcc)
mfcc_delta = librosa.feature.delta(mfccs)
mfcc_delta2 = librosa.feature.delta(mfccs, order=2)
mfccs = (np.hstack((np.std( mfccs, axis=1), np.std( mfcc_delta, axis=1)
, np.std( mfcc_delta2, axis=1))))
except Exception as e:
print("Error encountered while parsing file: ", file_name)
return None
return mfccs
However, when I use this approach, my accuracy score is very poor, on the order of 65%.
As a starting point, I just wanted to verify if this approach to extract the MFCCs and its Deltas are correct and that the output makes sense? If not, what is the best practice to extract the Delta and Delta-Delta with the MFCC feature extraction.
Any guidance will be sincerely appreciated!
Thank you!