SQL Storage instead of Imgseek Storage System

Sebastian

unread,

Feb 16, 2008, 3:29:06 PM2/16/08

to imgseek_dev

Hi,

I am wondering if it is useful to store the image signatures within
MySQL fo example.

What are the pros and cons?
What's about the performance?

Sebastian

Ricardo Niederberger Cabral

unread,

Feb 16, 2008, 5:52:47 PM2/16/08

to imgse...@googlegroups.com

It wouldn't make much sense, since relational database management
systems like MySQL are good at (as their name says) storing and
retrieving relational data, that is, tables arranged by columns and
rows, storing tuples consisting of integers, floats and strings.

Content-based image databases like the one at isk-daemon, on the other
hand, do not use data structures that map easily into the relational
data model or that wouldn't benefit much from the indexing and
optimizations allowed by them. Content-based image databases rely on
optimized binary data structures (linked lists, hashmaps) and so on
for storing image signatures and quickly retrieving them.

--
Ricardo Niederberger Cabral
http://www.imgseek.net/
ricardo.cabral at imgseek.net
skype: rnc000

Sebastian

unread,

Feb 17, 2008, 11:11:06 AM2/17/08

to imgseek_dev

Hi,

I asked because I am looking for a way how to store a high number of
images (signatures) in the system.

As I understand the code, you use a linear search querying the whole
index which is hold in the RAM.
If data volume increases also the search time and uses RAM will
increase.

What about using a system like Lucene?
Do you have any other suggestions how to solve this problem? Or isn't
that a problem and I am on the wrong way?

Best Wishes,
Sebastian

On Feb 16, 11:52 pm, Ricardo Niederberger Cabral

<ricardo.cab...@imgseek.net> wrote:
> It wouldn't make much sense, since relational database management
> systems like MySQL are good at (as their name says) storing and
> retrieving relational data, that is, tables arranged by columns and
> rows, storing tuples consisting of integers, floats and strings.
>
> Content-based image databases like the one at isk-daemon, on the other
> hand, do not use data structures that map easily into the relational
> data model or that wouldn't benefit much from the indexing and
> optimizations allowed by them. Content-based image databases rely on
> optimized binary data structures (linked lists, hashmaps) and so on
> for storing image signatures and quickly retrieving them.
>
> --

> Ricardo Niederberger Cabralhttp://www.imgseek.net/

Ricardo Niederberger Cabral

unread,

Feb 17, 2008, 2:10:52 PM2/17/08

to imgse...@googlegroups.com

On 17/02/2008, at 13:11, Sebastian wrote:

>
> Hi,
>
> I asked because I am looking for a way how to store a high number of
> images (signatures) in the system.
>
> As I understand the code, you use a linear search querying the whole
> index which is hold in the RAM.
> If data volume increases also the search time and uses RAM will
> increase.

you're right about that.

>
>
> What about using a system like Lucene?

Same things I said about mysql would apply here: Lucene is document-
oriented, so its optimized for storing text-based documents and
quickly retrieving documents associated with keywords.

>
> Do you have any other suggestions how to solve this problem? Or isn't
> that a problem and I am on the wrong way?
>

The best approach would be running isk-daemon on a cluster of
machines, each with plenty of RAM memory. Each would have it's own isk-
daemon image database. Images would be added to this cluster to a
specific isk-daemon instance, on a round-robin fashion, while querying
would have to involve querying each node and the similarity results at
each node combined to form the final answer. This way, for example on
a cluster with 3 machines, querying the 30 most similar images to a
given one would mean asking each of them for the 10 most similar
images they know about and combining the results. This way, one could
also scale even further into a hierarchy of isk-daemon nodes/clusters.

Another approach would be modifying the internal image buckets
algorithms and data structures so they are split between several
machines who know where to look for images with a given wavelet
feature and score them accordingly.

Regards,
--
Ricardo Niederberger Cabral

Reply all

Reply to author

Forward