If it's essential, the easiest thing would be to put pitch-shifted
versions of the originals in the database.
A small amount of pitch shifting shouldn't prevent recognition. You
should do some experiments to find the threshold: try pitch shifting a
set of queries by 0.125%, 0.25%, 0.5%, 1%, and measure how retrieval
accuracy varies with pitch shift.
Let's say 1% causes an unacceptable drop in performance. Now you
could duplicate every reference track to include +/- 1, 2, 3% pitch
shift (i.e. 7 versions total).
The database gets bigger, matching gets slower, but it's still
probably better than trying to perform major surgery on the innards of
the engine.
DAn.