Why are Constant Q Transform single coefficient in the result matrix

Robin MONTFERME

unread,

Jul 13, 2021, 4:58:14 AM7/13/21

to librosa

Hi, I'm currently trying to understand the way librosa generate the Constant Q Transform results.

From what I read from the published paper by Schorkhuber and Klapuri, both proposed method generate multiple CQT value per frequency bin and time frame. Shouldn't the result be on 3 axis then ?

Brian McFee

unread,

Jul 13, 2021, 11:05:52 AM7/13/21

to librosa

If I understand your question correctly, you're referring to the idea of using multiple phase offsets for each basis (section 3.2; figure 3 in SK10), which is helpful for capturing portions of the signal that would be missed by short analysis windows with large hop lengths.

We did not implement this feature, primarily because it produces a representation where the output frequencies are not guaranteed to be unique. This causes all kinds of subtle problems with downstream analyses (eg, visualization, chroma features, and so on). If frame overlap is a serious concern for your application, I would recommend to use either a smaller hop length, or increase the "filter_scale" parameter to use longer basis functions.

It is worth noting that a third axis would not work here, as the redundancy used by SK10 is variable: low frequencies may have little redundancy, high frequencies may have more (see figure 3).

Robin MONTFERME

unread,

Jul 15, 2021, 5:09:05 AM7/15/21

to librosa

Okay I see it now.

Regarding to my needs, I'm trying to understand the way librosa handle the CQT as I am tasked with re-implementing an algorithm using CQT coefficients as a descriptor.

Said algorithm has been implemented in Python using librosa however I need to re-implement it in C++, hence I'm trying to understand how CQT is handled with librosa and compare it with existing CQT libraries in C++.
That aside, I assume thus that the implementation of librosa CQT follows section 3.2 in SK10 then or is it something else ?

Brian McFee

unread,

Jul 15, 2021, 9:24:22 AM7/15/21

to librosa

On Thursday, July 15, 2021 at 5:09:05 AM UTC-4 [redacted] wrote:

Okay I see it now.
Regarding to my needs, I'm trying to understand the way librosa handle the CQT as I am tasked with re-implementing an algorithm using CQT coefficients as a descriptor.
Said algorithm has been implemented in Python using librosa however I need to re-implement it in C++, hence I'm trying to understand how CQT is handled with librosa and compare it with existing CQT libraries in C++.
That aside, I assume thus that the implementation of librosa CQT follows section 3.2 in SK10 then or is it something else ?

It "follows" the SK10 algorithm in the sense of applying a single-octave transform to recursively downsampled versions of the input, but we make no guarantees about exact compatibility or numerical equivalence. Over the years, we've made several modifications, generalizations, and extensions to the core function to support various use cases. I'd be hard-pressed to enumerate all of the changes in one go, but probably the most important for checking compatibility will be basis sparsification (mentioned briefly in SK10, but not elaborated), early downsampling, and the low-pass filter (bidirectional butterworth in SK10, several options in librosa; default being an approximate continuous-time FIR filter, but other modes are supported as well). Other features like tuning adaptation, variable-Q support (added later in the CQT toolbox, see ref in librosa.vqt docstring), Q-factor scaling (filter_length), hybrid/pseudo-cqt, different basis window shapes, and normalization are all less important, and can be ignored or disabled if you just need a basic CQT.

Admittedly, the librosa implementation is probably a bit difficult to untangle, due to being factored out into a dozen or so different functions. Having spent some time looking through the CQT toolbox, I'm not sure it would be a better or simpler starting point, but it will depend on your comfort level with python vs matlab.

At a high level, getting numerical equivalence across languages and implementations for this kind of thing is going to be pretty tough. I recommend starting simple, defining an acceptable tolerance level for what you consider to be "close enough" (eg, matching magnitudes to 1e-6 or something; phases can be tricky for all kinds of reasons, but phase differentials should be easier to match), and going from there.

I hope that helps -- I'd be curious to hear how it goes!

Robin MONTFERME

unread,

Jul 16, 2021, 1:20:14 PM7/16/21

to librosa

Thank, it helps me understand the inner workings better, but it won't help me getting to my goal, at least for now.

Sadly my current project is time-limited and with little time I have left I'll have to drop the Constant Q Transform for now.

That said, with where my career seems to lead me, it's not out of the question that I might work on that once more.

Thank you very much for all the information

Reply all

Reply to author

Forward