Hi guys,
Thanks for the discussion. It is riveting to read for me.
I believe I have an explanation for your troubles.
Lengthy explanation to follow… (technical notes in cursive script)
First of all, an apology for not mentioning the problems with the sinatra server memory index option.
If Picky is run in the memory indexes, it keeps the memory exclusively in memory. This is why you load/dump the indexes from/into files at the start and end.
This also means that Picky is bound by the limitations of Ruby and the limitations of the OS it runs in. Read on as to what I mean.
Let's assume at server startup you load the indexes.
If you now use a server that forks off worker children, they each have separate indexes.
They actually access the same memory, initially. With Ruby < 2.0, Ruby will gradually change the indexes (even though they are only read) – and COW (copy on write) will separate the memory areas associated with a specific index. So gradually, in the actual memory, the indexes will be separated. In Ruby 2.0, with the bitmap marking garbage collection, I expect this to behave differently.
So what is happening in your Unicorn example, Andi? Let's say we have 3 workers, A, B, and C.
A will receive the replace item in index request, and will duly replace the item in the index. However, B and C never hear of this change, which is located exclusively in A.
That explains why you only get correct results in about 1/3 of the times. Or sometimes dump the right index, or see old results. It all depends which worker does the job.
So what are the solutions?
- Evaluate whether you really need multiple workers – Picky is normally fast enough, regarding raw speed – and if not, just use 1. (This is not a solution if you have many users, as congestion looms)
- Use a separately handled index (single point of access), for example Redis, which all workers access separately. In the case of Redis, loading/dumping is not necessary. However, you lose speed compared to the in-memory solution.
In addition to this, many users have come up with funky solutions to be able to use the memory indexes – they wanted the speed – in such a scenario:
- Data is stored in Redis, use a pubsub queue from all the workers, and all the workers get the updates from that pubsub queue, and each updates separately (
laut.fm).
- If a worker receives an update request, the worker tells the master via IPC (interprocess communication), using eg. the Cod gem. Then, the master:
- tells each worker to update its index.
- updates its own index, and gradually restarts the workers (the workers do harakiri via Rack::Harakiri, code in Picky).
Again, sorry for not explicitly noting that in my first suggestion – I hope this explanation was at least informative :) The people who came up with the funky solutions were happy about the learning experience, but if you don't have the time or interest, perhaps using a single server or moving to Redis (Roger: Perhaps your SQLite backend also works?) is best for you. Let me know what you think/are going to do, please :)
As an outlook:
I believe we need to perhaps look into making a standalone server that uses *gulp* threads for multiple workers which access the indexes correctly. Not incredibly keen on implementing that, but perhaps it will be necessary.
I'd love to work on making this all much easier, but Picky has been eating a lot of time, and that needs to be invested in my company and my PhD currently. However, baby steps. Also, I am very interested in getting people like you, Andi, to work on it. Roger is already helping out quite a bit!
Florian