Redis realtime indexing speed

Roger Braun

unread,

Feb 4, 2013, 3:18:46 PM2/4/13

to picky...@googlegroups.com

Hi everyone,

I have a somewhat unusual setup running where Picky runs ONLY via
realtime updating and does not do any upfront indexing at all. I have
been using the memory backend and updating speed has been quite good.
For stupid reasons (Passenger not running at_exit) I want to switch to
a Redis backend. This works, but is super slow, several seconds for
one update. I guess this is because replacing an ID actually has to
walk through all the entries in all categories to figure out if the
current object id is in there. I have rebuild a test version of my
situation at https://gist.github.com/4709109. Running app.rb will do a
benchmark with Memory and Redis.

Is there anything that can be done about this, like keeping track
where an id goes? I could switch to indexing most things upfront, but
it would not change that updating a single object in Redis takes
several seconds.

--
Roger Braun
rogerbraun.net | humoralpathologie.de

Message has been deleted

Picky / Florian Hanke

unread,

Feb 4, 2013, 7:24:31 PM2/4/13

to picky...@googlegroups.com

Hi Roger,

Thanks for the feedback.

I have removed the last post, as due to forgetting having changed the client sending PUTs instead of POSTs I had fooled myself into thinking everything was fine when it was not. Apologies.

Your guess is good, but I use yet more inverted indexes to find the entry, so an index walk is not necessary.

Let's see what happens with Redis.

> brew unlink redis

> brew install redis

redis-2.6.9 installed

In another terminal:

> redis-server

In yet another terminal:

> redis-cli

redis 127.0.0.1:6379> MONITOR

OK

1360021379.857702 "MONITOR"

(running your program – but with just a single entry, ie. @data << rand_data(0) – takes 2.4 seconds – and MONITORing, we see that it blasts out 10000 lines, ie. 10K operations on Redis)

Shock. What is happening here?

When indexing, Picky basically does the same as if it was a memory index, just on Redis – with some added operations.

So, for each category (45 in your case), for each word (7 in your case) for each partial data element (eg. lorem/lore/lor, ~3 for each token), for the inverted and weights index (2), it does about 5-6 operations.

In total: 45*7*2.8*2*5.5 = 9702. Whoops.

Let's do some experiments!

Let's make all categories non-partial (partial: Picky::Partial::None.new).

Still 3174 operations taking 0.76s per entry. That makes (more or less) sense.

Let's only use a single category (the title).

"Only" 180 operations taking 0.06s per entry.

With a single category, no partial:

0.03s.

Why isn't it faster?

The goal with Picky and Redis was to make searching fast (using the Redis 2.6+ scripting feature, see https://github.com/floere/picky/blob/master/server/lib/picky/backends/redis.rb#L167), but I did not look at how to make indexing fast.

Using scripting, we could send a block of data – the whole entry – to Redis, after installing a custom script and have it be executed on each subsequent run. However, that is not yet implemented.

Also, that does not solve your problem right now. One recommendation would be to see if you really need the default partial on all the categories.

Also, do you really have to run it on passenger? If yes, would periodically dumping the index be an idea? (I know, "periodically" is a problem) Would periodically indexing be an option, with live reloading of the index?

In any case, with Redis 2.6+ being released, we should look into making Picky Redis indexing script based :)

Sorry I don't have a snazzy solution ready for use – except maybe making some categories non-partial, which would at least make it a factor of up to 3x faster.

Cheers,

Florian

Roger Braun

unread,

Feb 4, 2013, 8:24:35 PM2/4/13

to picky...@googlegroups.com

Hi Florian,

wow, thank you so much for looking at this so quickly and for explaining everything. Having a problem with or finding a bug is really not that bad with Picky, because I always learn something ;-)

It's okay if only the Memory backend is fast enough right now, I can work around it and just not use Passenger, I just wanted to keep everything as simple as possible and we use Passenger for everything else. But it's good to know there still is potential for speed improvement in the future.

Thanks again, now stop helping your freeloading users and work on your PhD ;-)

Picky / Florian Hanke

unread,

Feb 5, 2013, 7:19:10 PM2/5/13

to picky...@googlegroups.com

Glad you liked it, my pleasure :) And yes, there is a lot of potential for speed improvement!

Cheers and all the best

Reply all

Reply to author

Forward