Starting multiple engines at once

7 views
Skip to first unread message

Aaron Lee

unread,
Aug 24, 2010, 8:46:39 AM8/24/10
to ruote
We noticed some odd behavior when trying to start more than one engine
at once. I posted an issue about it to the github issue tracker, but
John suggested it would be better discussed here. I'm going to copy in
the start of our discussion:

Starting multiple engines at once can cause corruption of
"configurations/participant_list". You can cause this to happen on one
machine by putting a 'sleep (rand 1000)/1000.0' at line 140 of
storage.rb.

We first noticed this when trying to run several engines on separate
machines talking to the same redis db. After tracing it down to the
setup of the participants I noticed that storage.rb tries to set the
index directly instead of using redis's build-in incr method. Using
this method should prevent engines from grabbing the same id and
corrupting each other's data. I posted a possible fix here
http://github.com/wwkeyboard/ruote-redis/commit/d838ac61d1564172464374e8b2551741b806f8f2

John replied with:

Nice, I will test this ASAP.

should we remove the following @redis.set(key, nrev) as well ?

Hello,

unfortunately your patch is [intermittently] breaking my concurrence
tests (test/functional/ct_0_concurrence.rb and test/functional/
ct_2_cancel.rb)

I run these tests like this :

. test/functional/crunner.sh X -- --redis

where X is 0, 1 or 2 (ct_0, ct_1 or ct_2). I let each tests run at
least 100 times.

Could you please tell how exactly "configurations/participant_list"
gets corrupted ? Error messages, symptoms, ... Does it only ever
happen to the participant list, no "corruption" for other types ?

I have tried with "sleep (rand 1000)/10000.0" (10k) but it only breaks
ct_2. I will continue investigating, I thought my setnx technique was
OK, but it seems not.

I suggest continuing the exchange on http://groups.google.com/group/openwferu-users
those issue trackers textboxes are too constraining...

Many thanks for your feedback so far.

John Mettraux

unread,
Aug 24, 2010, 8:53:09 AM8/24/10
to openwfe...@googlegroups.com

On Tue, Aug 24, 2010 at 05:46:39AM -0700, Aaron Lee wrote:
>
> We noticed some odd behavior when trying to start more than one engine
> at once. I posted an issue about it to the github issue tracker, but
> John suggested it would be better discussed here. I'm going to copy in
> the start of our discussion:

Hello Aaron,

welcome on the mailing list.

I think I understand what's wrong. I should hopefully have a fix very soon, your incr() suggestion helps a lot.


Thanks for the feedback so far,

--
John Mettraux - http://jmettraux.wordpress.com

John Mettraux

unread,
Aug 24, 2010, 8:32:29 PM8/24/10
to openwfe...@googlegroups.com

On Tue, Aug 24, 2010 at 09:53:09PM +0900, John Mettraux wrote:
>
> I think I understand what's wrong. I should hopefully have a fix very soon, your incr() suggestion helps a lot.

It's taking me a bit more time, totally re-considering this ruote-redis storage implementation.


Sorry for the delay,

Aaron Lee

unread,
Aug 24, 2010, 8:56:50 PM8/24/10
to ruote
We have a workaround at the moment(start the engines one at a time).
Let me know if I can do anything else to help.

Aaron

John Mettraux

unread,
Aug 24, 2010, 9:05:00 PM8/24/10
to openwfe...@googlegroups.com
On Wed, Aug 25, 2010 at 9:56 AM, Aaron Lee <wwkey...@gmail.com> wrote:
> We have a workaround at the moment(start the engines one at a time).
> Let me know if I can do anything else to help.

When I have a decent re-implementation that passes all my tests, I'll
ask for you "greenlighting" it.

Maybe I'll have to change the way it stores data, if so, I'll make
sure to warn you about it.

BTW, in what kind of context / what kind of processes do you intend to
run with ruote ? I've seen you're connected with the Thoughtworks
guys.

Best regards,

John Mettraux

unread,
Aug 25, 2010, 1:10:49 AM8/25/10
to ruote

On Wed, Aug 25, 2010 at 10:05:00AM +0900, John Mettraux wrote:
> On Wed, Aug 25, 2010 at 9:56 AM, Aaron Lee <wwkey...@gmail.com> wrote:
> > We have a workaround at the moment(start the engines one at a time).
> > Let me know if I can do anything else to help.

BTW, which version of Redis are you using ?

I'm considering upgrading from 1.2.6 to 2.0 in order to get blpop and co.

John Mettraux

unread,
Aug 25, 2010, 11:10:18 AM8/25/10
to openwfe...@googlegroups.com

On Wed, Aug 25, 2010 at 02:10:49PM +0900, John Mettraux wrote:
>
> On Wed, Aug 25, 2010 at 10:05:00AM +0900, John Mettraux wrote:
> > On Wed, Aug 25, 2010 at 9:56 AM, Aaron Lee <wwkey...@gmail.com> wrote:
> > > We have a workaround at the moment(start the engines one at a time).
> > > Let me know if I can do anything else to help.
>
> BTW, which version of Redis are you using ?
>
> I'm considering upgrading from 1.2.6 to 2.0 in order to get blpop and co.

Finally, I'm sticking with 1.2.6 and only using setnx but for a classical locking mecha :

http://github.com/jmettraux/ruote-redis/commit/468a3a89f02149c09a0a9634ceedac32b678fc32

Please tell me how it works for you (engine startup issue).

I still have to implement lock expiration, I only did the preliminary work (storing the timestamp as the lock value).


I will go on tomorrow, cheers,

Reply all
Reply to author
Forward
0 new messages