Hello there. I'm very interested in helping build up the fingerprint database (w/ the open codegen). I DJ and help out at CHIRP Radio (
http://chirpradio.org/) which is a non-commercial radio station in Chicago; we broadcast live 18 hours a day, 7 days a week. As an experiment I started running a daemon that takes 40 second samples of our live stream and posts the fingerprint and song identification to the hosted EchoNest API. There is a live DJ constantly updating which song is currently playing so this data is pretty accurate. CHIRP plays a lot of new music so this could be a good way to keep EchoNest up to date.
I have a few questions:
- What exactly happens when I post the song identification to EchoNest with a fingerprint? Does it eventually help make a match the next time someone queries for a similar fingerprint?
- The DJs sometimes have a hard time looking up the song so their identification could be delayed. Is it ok to send a small amount of mis-identifications?
- The station goes off the air late at night and broadcasts silence. If I left a daemon running it would post the silence fingerprints with the last identified song (which is wrong). Is this ok?
- Since I am taking 40 second samples sometimes there is a DJ talking or two songs are in transition, etc. It is not 100% accurate.
My main interest is that I'd like to seed the EchoNest database with new song data so that I can use the API for other projects to identify lots of songs. Would my data seeding lead to this eventually? Or would I have to de-duplicate the data myself, analyze it, and make the queries resolve for new fingerprints myself?
Here is the source of the script I've been running in case anyone is curious:
https://github.com/chirpradio/chirpradio-echoIf this sounds like a good idea, we have broadcast archives with metadata going all the way back to 2010. That's more than 19,710 hours of music that I could use for seeding.
Kumar