How to get and display pitches of a wav audio file with librosa?

3,754 views
Skip to first unread message

skyman...@gmail.com

unread,
Oct 24, 2016, 11:44:29 AM10/24/16
to librosa
The following are my source codes:

    wave_data, samplerate=librosa.load(librosa.util.example_audio_file())
    pitches, magnitudes = librosa.piptrack(y=wave_data, sr=samplerate)
    plt.subplot(212)
    plt.plot(pitches)

the attched file is the out figure which seems to be strange and wrong
would you like to teach me how to display pitches calculated by librosa.piptrack function?


Dan Ellis

unread,
Oct 24, 2016, 1:33:24 PM10/24/16
to skyman...@gmail.com, librosa
librosa.piptrack returns two 2D arrays with frequency and time axes.  The "pitches" array gives the interpolated frequency estimate of a particular harmonic, and the corresponding value in the "magnitudes" array gives the energy of the peak.  Many values in both arrays are zero, indicating that there was no local maximum in the spectrum at that time-frequency cell.

To see what "pitches" is really returning, you want to do something more like:

  plt.imshow(pitches[:100, :], aspect="auto", interpolation="nearest", origin="bottom")

or perhaps:

  plt.plot(np.tile(np.arange(pitches.shape[1]), [100, 1]).T, pitches[:100, :].T, '.')

Then you can see that the frequencies being returned are the harmonics; the part that is missing is the search for one or a few fundamental frequencies that explain the majority of the harmonic frequencies observed, i.e. the approximate greatest common divisor.  This is a long-standing problem in pitch tracking, solved with things like Duifhuis's "harmonic sieve".  

It would make a lot of sense to include some functions for this in librosa, but it's not easy to make something robust.  AFAIK we don't have anything.

  DAn.

--
You received this message because you are subscribed to the Google Groups "librosa" group.
To unsubscribe from this group and stop receiving emails from it, send an email to librosa+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/librosa/d2a9f2e2-5c6a-4f49-9e3c-f81458211b1e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

skyman...@gmail.com

unread,
Oct 25, 2016, 10:30:50 AM10/25/16
to librosa, skyman...@gmail.com, dp...@ee.columbia.edu
   I have changed my source code as follows:

    wave_data, samplerate=librosa.load(librosa.util.example_audio_file())

    plt.subplot(211)
    plt.plot(wave_data) 
    plt.title('wave')
    pitches, magnitudes = librosa.piptrack(y=wave_data, sr=samplerate)
    plt.subplot(212)
    plt.imshow(pitches[:100, :], aspect="auto", interpolation="nearest", origin="bottom")
    plt.title('pitches')

and the figures are

the shape of pitches array are [1025,2647]
2647 are the number of audio frames,but what does 1025 mean? 
I think each frame should have only one pitch,why piptrack function outputs 1025 for each frame? Should the shape of pitches array be  [1,2647}?


在 2016年10月25日星期二 UTC+8上午1:33:24,Dan Ellis写道:
librosa.piptrack returns two 2D arrays with frequency and time axes.  The "pitches" array gives the interpolated frequency estimate of a particular harmonic, and the corresponding value in the "magnitudes" array gives the energy of the peak.  Many values in both arrays are zero, indicating that there was no local maximum in the spectrum at that time-frequency cell.

To see what "pitches" is really returning, you want to do something more like:

  plt.imshow(pitches[:100, :], aspect="auto", interpolation="nearest", origin="bottom")

or perhaps:

  plt.plot(np.tile(np.arange(pitches.shape[1]), [100, 1]).T, pitches[:100, :].T, '.')

Then you can see that the frequencies being returned are the harmonics; the part that is missing is the search for one or a few fundamental frequencies that explain the majority of the harmonic frequencies observed, i.e. the approximate greatest common divisor.  This is a long-standing problem in pitch tracking, solved with things like Duifhuis's "harmonic sieve".  

It would make a lot of sense to include some functions for this in librosa, but it's not easy to make something robust.  AFAIK we don't have anything.

  DAn.
On Mon, Oct 24, 2016 at 11:44 AM, <skyman...@gmail.com> wrote:
The following are my source codes:

    wave_data, samplerate=librosa.load(librosa.util.example_audio_file())
    pitches, magnitudes = librosa.piptrack(y=wave_data, sr=samplerate)
    plt.subplot(212)
    plt.plot(pitches)

the attched file is the out figure which seems to be strange and wrong
would you like to teach me how to display pitches calculated by librosa.piptrack function?


--
You received this message because you are subscribed to the Google Groups "librosa" group.
To unsubscribe from this group and stop receiving emails from it, send an email to librosa+u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages