Hello Yannick,
Thanks for making this library!
I notice that after calling sound.to_pitch, the number of frames in the result is not total_samples / hop_length
but rather
total_samples / hop_length - 3
When using very_accurate=True i get
total_samples / hop_length - 6
It seems like this is related to window length: "Very accurate (standard value: off) -- if off, the window is a Hanning window with a physical length of 3 / (pitch floor). If on, the window is a Gaussian window with a physical length of 6 / (pitch floor), i.e. twice the effective length."
I am guessing the pitch extractor starts counting the first frame from 1.5 frames into the signal (for very_accurate=False).
I need to match the frame count to total_samples / hop_length to align with mel-spectrograms.
Currently i'm padding 1.5*hop_length to both ends of the signal before extracting pitch. Is this solution correct?
Code:
hop_length = 256
def check_f0_praat(audio, fs, very_accurate=False):
multiple = 3.5 if very_accurate else 1.5
padding = int(multiple*hop_length)
x = np.pad(audio, (padding, padding))
sound = parselmouth.Sound(x, fs)
time_step = hop_length / fs
pitch = sound.to_pitch_ac(time_step=time_step, very_accurate=very_accurate)
f0 = np.array([p[0] for p in pitch.selected_array])
return f0
Regards
Perry