Input is a large audio file(~50 minutes). In the paper, they have 22k(sampling rate) audio file from which they have extracted Mel-spectrograms using Librosa with a window width of 20 ms and 2.5 ms hop length. The resulting
spectrograms are of two seconds audio and have 128×800 pixels. Then they augment the frequency domain with ZCR information.
I have 44k(sampling rate) audio file for which I have written this code:
import librosa
sr = 44100
frame_length = 4096
hop_length = 1024
stream = librosa.stream('final.wav', block_length=128, frame_length=frame_length, hop_length=hop_length)
mel_specs_log = []
zcr = []
for y in stream:
mel = librosa.feature.melspectrogram(y, sr=sr)
log_mel=librosa.power_to_db(mel)
mel_specs_log.append(log_mel)
zcr.append(librosa.feature.zero_crossing_rate(y))
Can someone please help me with
1. checking if my parameters are okay for 44k audio
2. how to add ZCR
as another image channel to each spectrogram cell
I someone can help me with this, I'd really b e grateful.
Thanks !