Hello,
I'm currently analyzing the fundamental frequency of relatively lengthy audio recordings (about 7 min each), which consist of conversations between two individuals. I've employed both librosa and parselmouth for the analysis. I observe a discrepancy in the length of the resulting fundamental frequency array:
parselmouth:
Could you direct me to the source of this difference and advise me on which method I should rely on? I'm new to audio analysis.
Moreover, I need to determine the temporal correspondence of these fundamental frequency values. Given that my recordings feature dialogues between two individuals, my objective is to filter out the segments where the second participant speaks, thereby retaining only the fundamental frequency values of the first participant. Subsequently, I aim to compute means over specific time windows, such as 20 or 30 seconds.
To achieve this, I need clarity on the "sampling frequency" (sorry for the potentially inaccurate term) of the f0 values.
Thank you for your help!
Pia