The main idea will not work by design. Even though the fingerprints
can be used for match audio excepts, the Acoustid server application
is not designed to do that and will not be able to find such
fingerprints. You would have to write a different application.
Another thing is that the fingerprint algorithm is optimized for
musical content. It might work on speech to some degree, but I've not
tested that and I don't believe the results will be very good because
of the limited frequency ranges.
Additionally, the algorithm is not designed to handle the kind of
additional noise you get when recording something using a phone/laptop
microphone. The training set used for the filter selection included
only transcoded samples, not samples with any external noise.
Lukas
To be honest, I don't know. :) I knew next to nothing about this kind
of audio analysis before I started working on this project and I still
can't claim that I know that much. I have a list of open source audio
fingerprinting projects here, but neither of them is specialized in
speech matching:
https://github.com/lalinsky/acoustid-index/wiki/Links
Chromaprint is designed more as a framework than a specific
fingerprint implementation. If you read the source code, you will
notice that there are a couple of "standalone" modules that can be
configured and connected together to extract different audio features.
Especially if you go through the history, you will find several
configurations that I evaluated in the past. So yes, I believe that
implementing a different fingerprint algorithm using the Chromaprint
source code should be fairly easy, but I personally don't know much
about speed audio features.
Lukas
I'm afraid there is no simple change that you can make to make it
better. The algorithm was designed with the intention of identifying
unmodified files with minimal hardware resources. At each point in
time, the algorithm looks at almost two seconds of audio data and it
looks at the whole frequency spectrum, not just peaks. That makes it
very hard for the algorithm to identify noisy audio, but it makes it
very efficient to identify unmodified audio, because most fingerprint
items are unique.
> I am also playing around with the openfp code and echonest code.
> openfp is pretty impressive, but echonest is really subpar.
I'm surprised, because the Echoprint algorithm is modeled after Shazam
and it was designed specifically for this situation. I haven't
actually read the OpenFP code, so I don't know what they are using.
> I am hoping that by making some simple changes to acoustid, it will
> beat openfp as well. Please let me know which files I need to take a
> look at.
See above, I'm afraid it won't be that easy.
Lukas
This is pretty standard setup for any hashed fingerprints. You have an
index where you are looking for exact matches on atomic parts of the
fingerprint. Once you get these matches, you retrieve the full
fingerprints and compare against the query. Acoustid does the same,
the only difference is how you compare the full fingerprints.
Regarding the algorithm, the Echoprint fingerprints are basically
sequences of timestamped hashes (20 bits if I remember correctly). The
hashes describe pairs of peaks on the spectrogram. You can read about
the basic idea in the old Shazam paper ("An Industrial-Strength Audio
Search Algorithm").
> At this point, I'm thinking that I use the openfp code to generate the
> fingerprints and use your postgres backend. In theory this should work
> as both openfp and acoustid fingerprints are compared using hamming
> distances unlike echonest. This will also overcome limitations of
> openfp_server.
I don't think you can easily do this. The OpenFP fingerprints are a
mix of various approaches. The hashes that you can search are
generated using the Philips Robust Hashing algorithm, which is very
similar in structure to the Acoustid fingerprints, but then you have
the MFCCs which you have to compare differently. Theoretically, I
think that the Echoprint implementation should work the best for you,
but I've not tried it practically.
Lukas