Picky setup recommendations

Andi Schacke

unread,

Jan 25, 2013, 7:35:20 AM1/25/13

to picky...@googlegroups.com

Hi all (and Flöre especially :-),

I am currently using picky directly embedded in a Rails 3 app (using a picky.rb initializer where I load/dump indexes at boot/exit). I am also using an after_save callback to update the picky index when a specific model changes. This all works great. However, of course this set up doesn't work if you have multiple app servers in production. What would be the recommended setup for this scenario? I can imagine two options:

1) Setup picky as a standalone search engine (Sinatra app) on a separate server which all app servers can access.
2) Leave picky integrated in Rails, but have a Redis backend on a separate "caching" server

Are there more options? How would I handle delta updates (if a model changes) for option 1?

Thanks for this great gem, any feedback would be appreciated.
Cheers
Andi

Picky / Florian Hanke

unread,

Jan 26, 2013, 4:09:42 PM1/26/13

to picky...@googlegroups.com

Hi Andi,

a) These are basically the two options you have (apart from adding your own backend).

b) Delta updates would use the "index actions" on the server:

https://github.com/floere/picky/blob/master/server/lib/picky/sinatra/index_actions.rb (Specs: https://github.com/floere/picky/blob/master/server/spec/integration/sinatra_index_actions_spec.rb)

And on the app servers, using the Picky client code:

https://github.com/floere/picky/blob/master/client/lib/picky-client/client_index.rb

You already use after_save – you could use the Picky active record to connect the model and the client:

https://github.com/floere/picky/blob/master/client/lib/picky-client/client/active_record.rb

So, the model calls the callback, then that calls the client's replace or remove method, which in turn calls the server's index actions.

Does that help? I can write up some more detailed infos.

I am currently working to extend the model/indexing handling.

Cheers and all the best,

Flöre

P.S: If you need more infos, just write :)

Andi Schacke

unread,

Jan 30, 2013, 5:30:12 AM1/30/13

to picky...@googlegroups.com

I going for the separate picky sinatra server. Will let you know if I
need more info regarding the setup.

Currently only two questions remain:

1) Event if I do delta updates using the index actions, would you
recommend to re-index the data once a day?
2) Do I have to explicitly dump the index from time to time?

Thanks for your feedback and keep up the good work!
Andi

Andi Schacke

unread,

Jan 30, 2013, 8:29:51 AM1/30/13

to picky...@googlegroups.com

Regarding the setup:

everything seems to work like a charm, except updating the index using
the index-actions. I can confirm that the PUT Request (fired from the
app server using Picky::Client::ActiveRecord) arrives in the sinatra
picky servers with the correct params, and index.replace_from(…) is
called as well.

But
1) if I query the sinatra app for the updated information, nothing is returned
2) if I query the sinatra app with the old information, it is still displayed
3) if I dump the index, the new information is not dumped into the
index file (still the old information)

Any hints?

Thanks!
Andi

Roger Braun

unread,

Jan 30, 2013, 8:31:38 AM1/30/13

to picky...@googlegroups.com

Are you using the latest version on both ends? There was an error in
4.2.17, see https://github.com/floere/picky/pull/102.

2013/1/30 Andi Schacke <andreas...@gmail.com>:

> --
> You received this message because you are subscribed to the Google Groups "Picky-Ruby" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to picky-ruby+...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

--
Roger Braun
rogerbraun.net | humoralpathologie.de

Roger Braun

unread,

Jan 30, 2013, 8:37:36 AM1/30/13

to picky...@googlegroups.com

Sorry, answered to fast, seems to be not this bug if
index.replace_from is called...

2013/1/30 Roger Braun <ro...@rogerbraun.net>:

Andi Schacke

unread,

Jan 30, 2013, 8:54:09 AM1/30/13

to picky...@googlegroups.com

I'm using 4.12.x.

But I can clarify like this:
After updating using the PUT Request, picky server actually does find
the new data (it still does find the old data as well - but that is
not an issue), but when I dump the index, it still dumps the old data.
Is dump not meant to be there for putting the index back on the disk?
Should I rather use index.reindex from time to time?

Andi

Picky / Florian Hanke

unread,

Jan 30, 2013, 9:05:25 AM1/30/13

to picky...@googlegroups.com

Hi Andi and Roger,

Thanks for the feedback and help.

Sadly it is past bedtime here, so I can't look into it fully right now. But it's quite baffling and fascinating, and I hope to reproduce this behavior tomorrow or soon after. Maybe we could run a quick experiment.

Some ideas:

When you use realtime updating, Picky keeps an index of references pointing at the index as to what to update, and where (if it would not do this, it would have to do full scans of all indexes). When you load the indexes on startup, it currently does not rebuild these "realtime reference indexes" automatically. This could be a/the source of the problem.

So, as a quick experiment, could you call an_index.build_realtime_mapping on the freshly loaded index?

(This spec https://github.com/floere/picky/blob/master/server/spec/functional/realtime_spec.rb#L28-L42 illustrates what I mean)

If this does not help, I am happy to look into it more deeply :)

If this _does_ help, I am happy to accept ideas on how to make this run correctly without having to explicitly call it.

Cheers,

Florian

Andi Schacke

unread,

Jan 30, 2013, 9:35:53 AM1/30/13

to picky...@googlegroups.com

I will try this asap! Good night :-)

On Wed, Jan 30, 2013 at 3:05 PM, Picky / Florian Hanke

Andi Schacke

unread,

Jan 31, 2013, 9:23:38 AM1/31/13

to picky...@googlegroups.com

Hi all,

it's getting even weirder:

So I have the picky server app (sinatra) running on 3 unicorn workers.
I do the live update by posting to the index-update action. After that
- if i query the sinatra app, the results seem to be dependent on
which of the workers the request went to. If I search for the new
data, it finds it sometimes, sometimes not; same if I query the old
data. If I change unicorn.rb to only have 1 worker, do the update and
query the sinatra app, everything seems to be consistent. Is there any
possibility that the live update of the index is only visible to the
worker that has processed the update?

And in addition to that (even if only have 1 worker): dumping the
index (by sending a usr1 signal to server) does not reflect the
updated data, so I guess doing an index.load and
index.build_realtime_mapping does not help, right?

Andi

On Wed, Jan 30, 2013 at 3:05 PM, Picky / Florian Hanke

Andi Schacke

unread,

Jan 31, 2013, 10:11:30 AM1/31/13

to picky...@googlegroups.com

So what solves the problem is to add the "USR2 trick" to unicorn.rb
(to respawn the master and workers without interruption. Then change
the index action to send a 'kill -USR <pid>" after updating the index.
(see this gist: https://gist.github.com/4b4a8b540181115dc3a6). After
respawning the unicorns
a) querying the server returns always the new data
b) dumping the index reflects the changes…

But I guess this is not a good solution, is it?

Andi

Roger Braun

unread,

Jan 31, 2013, 10:14:42 AM1/31/13

to picky...@googlegroups.com

I actually may have this problem too... I switched to running Picky
behind Passenger + Nginx a while ago and also have the "disappearing
update" symptom, at least my users reported it. I was never able to
reproduce it, though...

2013/1/31 Andi Schacke <andreas...@gmail.com>:

Picky / Florian Hanke

unread,

Jan 31, 2013, 7:47:23 PM1/31/13

to picky...@googlegroups.com

Hi guys,

Thanks for the discussion. It is riveting to read for me.

I believe I have an explanation for your troubles.

Lengthy explanation to follow… (technical notes in cursive script)

First of all, an apology for not mentioning the problems with the sinatra server memory index option.

If Picky is run in the memory indexes, it keeps the memory exclusively in memory. This is why you load/dump the indexes from/into files at the start and end.

This also means that Picky is bound by the limitations of Ruby and the limitations of the OS it runs in. Read on as to what I mean.

Let's assume at server startup you load the indexes.

If you now use a server that forks off worker children, they each have separate indexes.

They actually access the same memory, initially. With Ruby < 2.0, Ruby will gradually change the indexes (even though they are only read) – and COW (copy on write) will separate the memory areas associated with a specific index. So gradually, in the actual memory, the indexes will be separated. In Ruby 2.0, with the bitmap marking garbage collection, I expect this to behave differently.

So what is happening in your Unicorn example, Andi? Let's say we have 3 workers, A, B, and C.

A will receive the replace item in index request, and will duly replace the item in the index. However, B and C never hear of this change, which is located exclusively in A.

That explains why you only get correct results in about 1/3 of the times. Or sometimes dump the right index, or see old results. It all depends which worker does the job.

So what are the solutions?

- Evaluate whether you really need multiple workers – Picky is normally fast enough, regarding raw speed – and if not, just use 1. (This is not a solution if you have many users, as congestion looms)

- Use a separately handled index (single point of access), for example Redis, which all workers access separately. In the case of Redis, loading/dumping is not necessary. However, you lose speed compared to the in-memory solution.

In addition to this, many users have come up with funky solutions to be able to use the memory indexes – they wanted the speed – in such a scenario:

- Data is stored in Redis, use a pubsub queue from all the workers, and all the workers get the updates from that pubsub queue, and each updates separately (laut.fm).

- If a worker receives an update request, the worker tells the master via IPC (interprocess communication), using eg. the Cod gem. Then, the master:

- tells each worker to update its index.

- updates its own index, and gradually restarts the workers (the workers do harakiri via Rack::Harakiri, code in Picky).

Again, sorry for not explicitly noting that in my first suggestion – I hope this explanation was at least informative :) The people who came up with the funky solutions were happy about the learning experience, but if you don't have the time or interest, perhaps using a single server or moving to Redis (Roger: Perhaps your SQLite backend also works?) is best for you. Let me know what you think/are going to do, please :)

As an outlook:

I believe we need to perhaps look into making a standalone server that uses *gulp* threads for multiple workers which access the indexes correctly. Not incredibly keen on implementing that, but perhaps it will be necessary.

I'd love to work on making this all much easier, but Picky has been eating a lot of time, and that needs to be invested in my company and my PhD currently. However, baby steps. Also, I am very interested in getting people like you, Andi, to work on it. Roger is already helping out quite a bit!

Cheers and all the best,

Florian

Picky / Florian Hanke

unread,

Jan 31, 2013, 7:47:59 PM1/31/13

to picky...@googlegroups.com

Hi Roger, my answer to Andi also applies in your case, if I am not mistaken. Cheers!

On Friday, 1 February 2013 02:14:42 UTC+11, Roger Braun wrote:

Andi Schacke

unread,

Feb 1, 2013, 2:51:26 AM2/1/13

to picky...@googlegroups.com

Hey Flöre,

thanks for the detailed explanation. This all makes sense now. For the
time being (as my time to investigate is also very limited), I will go
with one worker. This is totally ok for now as we don't have many
searches yet. picky will be bored even with one worker :-). If that is
not enough power at one point, I will look into the redis solution.
Speed is not an issue, since querying picky is only a part of the
whole user query, the other part being sending the query to google
maps api and then merging the results… google maps will always be the
slowest part…

Thanks again!
Andi

On Fri, Feb 1, 2013 at 1:47 AM, Picky / Florian Hanke

Florian Hanke

unread,

Feb 1, 2013, 3:33:29 AM2/1/13

to picky...@googlegroups.com

Great. Feel free to pull request me a README with your URL on it :)

Picky / Florian Hanke

unread,

Feb 7, 2013, 12:48:57 AM2/7/13

to picky...@googlegroups.com

Hi Andi,

I just wanted to point you to this discussion:

https://groups.google.com/forum/?fromgroups=#!topic/picky-ruby/9qc-gSoTMqE

In short: indexing with Redis is still quite short, but I am looking into speeding that up a lot at the moment :)

(by customizing realtime indexing for Redis)

Cheers,

Flöre

Picky / Florian Hanke

unread,

Feb 7, 2013, 12:57:02 AM2/7/13

to picky...@googlegroups.com

… and by "short", I mean "slow" ;)

Picky / Florian Hanke

unread,

Feb 7, 2013, 7:22:37 PM2/7/13

to picky...@googlegroups.com

Hi Andi and Roger,

Redis Realtime search mode contained a bug that is now covered by lots of specs, in 4.12.10 – thanks for pushing me (https://github.com/floere/picky/commit/838cbb8c4e44b13a649f48733e3d5362f876f83c#L0R5).

Cheers,

Florian

Roger Braun

unread,

Feb 9, 2013, 11:02:48 AM2/9/13

to picky...@googlegroups.com

Wonderful! Thanks, Florian!

2013/2/8 Picky / Florian Hanke <floria...@gmail.com>:

Reply all

Reply to author

Forward