Thanks for the feedback.
I have removed the last post, as due to forgetting having changed the client sending PUTs instead of POSTs I had fooled myself into thinking everything was fine when it was not. Apologies.
Your guess is good, but I use yet more inverted indexes to find the entry, so an index walk is not necessary.
Let's see what happens with Redis.
> brew unlink redis
> brew install redis
redis-2.6.9 installed
In another terminal:
> redis-server
In yet another terminal:
> redis-cli
OK
1360021379.857702 "MONITOR"
(running your program – but with just a single entry, ie. @data << rand_data(0) – takes 2.4 seconds – and MONITORing, we see that it blasts out 10000 lines, ie. 10K operations on Redis)
Shock. What is happening here?
When indexing, Picky basically does the same as if it was a memory index, just on Redis – with some added operations.
So, for each category (45 in your case), for each word (7 in your case) for each partial data element (eg. lorem/lore/lor, ~3 for each token), for the inverted and weights index (2), it does about 5-6 operations.
In total: 45*7*2.8*2*5.5 = 9702. Whoops.
Let's do some experiments!
Let's make all categories non-partial (partial: Picky::Partial::None.new).
Still 3174 operations taking 0.76s per entry. That makes (more or less) sense.
Let's only use a single category (the title).
"Only" 180 operations taking 0.06s per entry.
With a single category, no partial:
0.03s.
Why isn't it faster?
Using scripting, we could send a block of data – the whole entry – to Redis, after installing a custom script and have it be executed on each subsequent run. However, that is not yet implemented.
Also, that does not solve your problem right now. One recommendation would be to see if you really need the default partial on all the categories.
Also, do you really have to run it on passenger? If yes, would periodically dumping the index be an idea? (I know, "periodically" is a problem) Would periodically indexing be an option, with live reloading of the index?
In any case, with Redis 2.6+ being released, we should look into making Picky Redis indexing script based :)
Sorry I don't have a snazzy solution ready for use – except maybe making some categories non-partial, which would at least make it a factor of up to 3x faster.
Cheers,
Florian