Identifying tracks with altered pitch

542 views
Skip to first unread message

Steven Robertson

unread,
Jan 26, 2012, 9:18:55 AM1/26/12
to echoprint
Hello

Is it currently possible to identify tracks with altered pitch, using
Echoprint?

I've just read some info on Chromaprint and it doesn't seem well
suited with that approach, though I wonder if those finger-prints
could be stretched (within a range that would be reasonable for a DJ
to do) and still achieve a match under those conditions. It would be
fair to say that a DJ would alter the pitch but and keep it mostly
constant.

Is the algorithm for Echoprint similar to Chromaprint?

If it is not possible to detect tracks with altered pitch in Echoprint
currently, are there ideas about how this might be achieved?

Cheers

Steve

Laurent

unread,
Jun 25, 2012, 3:02:55 PM6/25/12
to echo...@googlegroups.com
Hi Steve,

I was searching the net for info on this topic as I am very interested to know if Echoprint does indeed support pitch bend. 
Were you able to find the answer? 

Regards,

Laurent

stever

unread,
Jul 31, 2012, 10:24:43 AM7/31/12
to echo...@googlegroups.com
Hi Laurent

I don't think it does and assume it would be difficult to do but not impossible, I guess. I don't believe Echoprint does, which is fine but I was really interested in obtaining tracklists from DJ mixes. Maybe it's better that this isn't so easy ;)

Regards

Steve

Laurent Novatin

unread,
Jul 31, 2012, 10:29:27 AM7/31/12
to echo...@googlegroups.com
Hi Steve,

Thanks for getting back to me. 
I actually did some tests in the meantime and found that it does not work with any pitch alteration... even small. 
Shazam and Soundhound on the other hand do if you are trying to ID tracks from mixes ;) 

Regards,

Laurent

stever

unread,
Jul 31, 2012, 10:35:55 AM7/31/12
to echo...@googlegroups.com
Hi Laurent

Interesting. If those have APIs then I would be interesting to try them again, but with EchoPrint the advantage would be in a community built database to ID any tracks. Most of the tracks I'd like to ID are unlikely to be in recognised by those commercial services. At least I didn't have much success when I tried them before.

Cheers

Steve

Brian Whitman

unread,
Jul 31, 2012, 10:37:51 AM7/31/12
to echo...@googlegroups.com
I am curious about this. Echoprint should be ok with small adjustments to pitch (nothing huge, maybe dan or andrew can weigh in with the actual math) but i can't see how soundhound or shazam based on what little i know of their algorithms can be better. Is there an example track within a mix we can look at?

Brian

Dan Ellis

unread,
Jul 31, 2012, 10:42:06 AM7/31/12
to echo...@googlegroups.com
It's possible that Shazam has a coarser time quantization than
echoprint, because in general they have more information from the
spectrum (since they pick out narrow spectral lines), so they can
afford to get less information from the time domain.

It's also possible that they include explicit compensation for the
kinds of time scaling that occur in the wild. For instance, they
could ingest time-scaled versions of popular tracks. I have no reason
to believe that, but that's what I'd do faced with the same problem.

DAn.

Andrew Nesbit

unread,
Jul 31, 2012, 10:46:24 AM7/31/12
to echo...@googlegroups.com
We haven't explicitly tested with large pitch adjustments, but you're right that with small frequency changes Echoprint should, in principle, cope - as long as the detected onsets remain within the same subbands in the filter bank decomposition and the timing of the onsets stays within the same quantization level (23.2 ms).

Best,

Andrew

Laurent Novatin

unread,
Jul 31, 2012, 11:01:59 AM7/31/12
to echo...@googlegroups.com
We tried with a song that existed in your catalogue; Manta by MJ Cole and removed the id3 tags to change the md5 checksum to make sure it wasn't reading tags. It worked at 0% but at -4, -3, -2 percent it did not work. 

On the other hand Shazam and Soundhound worked up to about +/- 4.5% with another track I tested. 

Regards,

Laurent

Andrew Nesbit

unread,
Jul 31, 2012, 11:04:35 AM7/31/12
to echo...@googlegroups.com
Did you change the pitch by adjusting the playback rate, or did you do keep the speed constant?

Thanks,

Andrew

Laurent Novatin

unread,
Jul 31, 2012, 11:07:33 AM7/31/12
to echo...@googlegroups.com
we increased the speed of the track

thkim

unread,
Sep 20, 2012, 2:27:51 AM9/20/12
to echo...@googlegroups.com
I guess, It is possible, only with the hashcodes without using timestamps. It means that only with the original_scores from the Solr result.
The Solr result, the origianl_score represents how many same featues(hashcode) have between two tracks, and does not consider time factor.
Then the actual_matches result, the actual_score counts how many of the matched hashcodes have same interval between two tracks.

Taehong Kim.



Xavier Vdb

unread,
Dec 7, 2012, 5:46:49 AM12/7/12
to echo...@googlegroups.com
Hey,

it's essential to achieve detection with pitch-shifted.

someone has tested the suggestion of Taehong Kim. ?
actual score is the key ?

Xavier.

Dan Ellis

unread,
Dec 7, 2012, 7:37:12 AM12/7/12
to echo...@googlegroups.com
If it's essential, the easiest thing would be to put pitch-shifted
versions of the originals in the database.

A small amount of pitch shifting shouldn't prevent recognition. You
should do some experiments to find the threshold: try pitch shifting a
set of queries by 0.125%, 0.25%, 0.5%, 1%, and measure how retrieval
accuracy varies with pitch shift.

Let's say 1% causes an unacceptable drop in performance. Now you
could duplicate every reference track to include +/- 1, 2, 3% pitch
shift (i.e. 7 versions total).

The database gets bigger, matching gets slower, but it's still
probably better than trying to perform major surgery on the innards of
the engine.

DAn.

Xavier Vdb

unread,
Dec 7, 2012, 12:26:54 PM12/7/12
to echo...@googlegroups.com, dp...@ee.columbia.edu
and do the opposite ?
generate several fingerprint with different pitches...

i want to use the echonest database.
In all cases, perform many of optimization for the server would be welcome.
I'll look, it's my job :)

Joe Andrews

unread,
Jan 26, 2013, 10:17:01 AM1/26/13
to echo...@googlegroups.com, dp...@ee.columbia.edu
Hi,

I plan on attempting to make the above. My current plan is to use break dj mixes into arbitrary overlapping samples of say 1 minute. I will then create 5 versions of these mix sections, each with a different pitch, in the range -2, -1, 0, 1, 2 (tones).It is unusual for a DJ to mix out of these ranges as audio quality tends to deteriorate. I will then analyse these samples through the remix api, and return the resulting sections. These musical sections will be analysed using the ENMFP fingerprinter and if all goes to plan, I will receive a large data set of codes which I can rank on frequency of occurrence and a comparison of how well two overlapping or adjacent sections compare interms of the musical analysis. 

Im not sure how well this will work in practice but intend to give it a try. In reality, I feel that the best solution will need to take into account both different pitches and different tempos and both of these separately. This would involve changing the source code to only match by either tempo and beat patterns or pitch timbre and chord progressions, and then combining the result.

My final question : Is the javascript remix api, a direct port of the python remix api. Does it have all the same methods? From what I have read so far, it seems far simpler and without many of the conversion methods etc...
Any thoughts would be appreciated.


Thanks 

Joe

Andrew Nesbit

unread,
Jan 26, 2013, 1:26:48 PM1/26/13
to echo...@googlegroups.com
As this question is a lot more about Remix than it is about fingerprinting, I would suggest that you ask in The Echo Nest Remix API Google Group: http://groups.google.com/forum/#!forum/remix-api to get a better chance of a helpful reply.

However, as far as fingerprinting is concerned, by the time the audio reaches the fingerprinting stage it will have been heavily processed, so I would not expect stellar results from ENMFP.


--
 
 

Thomas Elstrøm

unread,
Aug 8, 2014, 9:41:10 AM8/8/14
to echo...@googlegroups.com, stevero...@gmail.com, Espen K. Nilsen
*Bump!*

Has anyone solved this issue without generating multiple fingerprints or storing multiple versions in the database? I could probably solve it by sending multiple queries and analyze the total confidence per suggested track, but this should really be unnecessary.

Came across this study which propose a solution to the matching capabilities with tempo and pitch alterations, but it's a bit over my head when it comes to implementation, hehe... http://homepage.fudan.edu.cn/weili/files/2011/06/LW-2010MMShortZhu.pdf

I'm also trying to match songs from DJ sets and I'm getting horrible detection confidence with minor tempo/tempo+pitch/pitch adjustments. Anything over +-2% gives me most of the time no match at all. Could any of the developers shed some light on to what the tempo/pitch-functionalities are with respect to the current matching engine?

Thank you

henrique matias

unread,
Jul 23, 2015, 12:04:04 AM7/23/15
to echoprint
would definitely be interesting to see a solution that could work with dj mixes!

Richard Catalano

unread,
Jul 29, 2015, 12:51:25 AM7/29/15
to echoprint, stevero...@gmail.com
I too am incredibly interested in at the very least updating the code base so that it can atleast provide some viable competition to Shazam and Soundhound...I would like to use this in my project's software but we cannot allow for duplicates to pop into the database. Any suggestions of what I should be reading up on in order to help contribute? 
Reply all
Reply to author
Forward
0 new messages