librosa.piptrack returns two 2D arrays with frequency and time axes. The "pitches" array gives the interpolated frequency estimate of a particular harmonic, and the corresponding value in the "magnitudes" array gives the energy of the peak. Many values in both arrays are zero, indicating that there was no local maximum in the spectrum at that time-frequency cell.
To see what "pitches" is really returning, you want to do something more like:
plt.imshow(pitches[:100, :], aspect="auto", interpolation="nearest", origin="bottom")
or perhaps:
plt.plot(np.tile(np.arange(pitches.shape[1]), [100, 1]).T, pitches[:100, :].T, '.')
Then you can see that the frequencies being returned are the harmonics; the part that is missing is the search for one or a few fundamental frequencies that explain the majority of the harmonic frequencies observed, i.e. the approximate greatest common divisor. This is a long-standing problem in pitch tracking, solved with things like Duifhuis's "harmonic sieve".
It would make a lot of sense to include some functions for this in librosa, but it's not easy to make something robust. AFAIK we don't have anything.
DAn.