Content-based image databases like the one at isk-daemon, on the other
hand, do not use data structures that map easily into the relational
data model or that wouldn't benefit much from the indexing and
optimizations allowed by them. Content-based image databases rely on
optimized binary data structures (linked lists, hashmaps) and so on
for storing image signatures and quickly retrieving them.
--
Ricardo Niederberger Cabral
http://www.imgseek.net/
ricardo.cabral at imgseek.net
skype: rnc000
>
> Hi,
>
> I asked because I am looking for a way how to store a high number of
> images (signatures) in the system.
>
> As I understand the code, you use a linear search querying the whole
> index which is hold in the RAM.
> If data volume increases also the search time and uses RAM will
> increase.
you're right about that.
>
>
> What about using a system like Lucene?
Same things I said about mysql would apply here: Lucene is document-
oriented, so its optimized for storing text-based documents and
quickly retrieving documents associated with keywords.
>
> Do you have any other suggestions how to solve this problem? Or isn't
> that a problem and I am on the wrong way?
>
The best approach would be running isk-daemon on a cluster of
machines, each with plenty of RAM memory. Each would have it's own isk-
daemon image database. Images would be added to this cluster to a
specific isk-daemon instance, on a round-robin fashion, while querying
would have to involve querying each node and the similarity results at
each node combined to form the final answer. This way, for example on
a cluster with 3 machines, querying the 30 most similar images to a
given one would mean asking each of them for the 10 most similar
images they know about and combining the results. This way, one could
also scale even further into a hierarchy of isk-daemon nodes/clusters.
Another approach would be modifying the internal image buckets
algorithms and data structures so they are split between several
machines who know where to look for images with a given wavelet
feature and score them accordingly.
Regards,
--
Ricardo Niederberger Cabral