How to resume the process execution after sudden engine death - lets say power supply outage ?

37 views
Skip to first unread message

mex048

unread,
Mar 30, 2012, 3:23:22 AM3/30/12
to openwfe...@googlegroups.com
Suppose a file-system-storage used...
What is the 'proper' way to do it - if any ?

John Mettraux

unread,
Mar 30, 2012, 3:32:22 AM3/30/12
to openwfe...@googlegroups.com

On Fri, Mar 30, 2012 at 12:23:22AM -0700, mex048 wrote:
>
> Suppose a file-system-storage used...
> What is the 'proper' way to do it - if any ?

Hello Mex,

welcome to ruote's mailing list.

== step 1

Restart the ruote system

== step 2

If there are stalled processes, kill them or re-apply them at
their stalled leaves (processes always stall at the leaves anyway)


Best regards,

--
John Mettraux - http://lambda.io/processi

mex048

unread,
Mar 30, 2012, 4:01:30 AM3/30/12
to openwfe...@googlegroups.com
thnx for VERY quick response, John
sorry for not being clear enough
i mean - resume THE process,
e.g.
@ruote_engine = Ruote::Engine.new(Ruote::Worker.new(Ruote::FsStorage.new('some_disk_location', 'cloche_nolock' => true)))
pdef = Ruote.process_definition :name => 'something_to_be_interrupted'...
@ruote_engine.register_participant ...
@ruote_engine.register_participant ...
...
@ruote_engine.register_participant ...
exactly_this_process_wfid = @ruote_engine.launch(pdef , fields, vars)
#process wfid working
XXXXX BANG - POWER OUTAGE XXXXXX
XXXXX 5 MINS PASSED XXXXXX
XXXXX POWER RESTORED XXXXXX
@ruote_engine = Ruote::Engine.new(Ruote::Worker.new(Ruote::FsStorage.new('same_disk_location', 'cloche_nolock' => true)))
#How to resume THE process (exactly_this_process_wfid) ?
#Is it somehow preseved in storage?
#Is it resurrectable at all?
#or - it died and 'dead means dead'?


пятница, 30 марта 2012 г. 10:23:22 UTC+3 пользователь mex048 написал:

John Mettraux

unread,
Mar 30, 2012, 4:37:11 AM3/30/12
to openwfe...@googlegroups.com
Hello again,

== short answer

On Fri, Mar 30, 2012 at 01:01:30AM -0700, mex048 wrote:
>
> #How to resume THE process (exactly_this_process_wfid) ?

Simply restart ruote, the rest is taking care of for you.

> #Is it somehow preserved in storage?

Yes (ruote takes care of preserving the state of each processes for every
change they go through (that's why it's much slower than vanilla
interpreters))

> #Is it resurrectable at all?

Yes

> #or - it died and 'dead means dead'?


== long answer

On Fri, Mar 30, 2012 at 01:01:30AM -0700, mex048 wrote:
>
> #How to resume THE process (exactly_this_process_wfid) ?

Simply restart ruote, the rest is taking care of for you.

> #Is it somehow preseved in storage?

It depends on when the interruption occured.

> #Is it resurrectable at all?

It depends

> #or - it died and 'dead means dead'?

If the interruption occurred right after the worker picked the launch message
in the 'msgs' folder and right before it saved the first expression and
applied it, the process might die. You end up with a "wfid" that points to
nothing. It's a rare case, but it's not an impossible case.

If the interruption occurred right after an expression was saved and right
before it's applied, the process might stall. You can simply call re-apply on
the stalled leaf and it will go on.

You could do that [automatically] with:

ps = @ruote_engine.ps(exactly_this_process_wfid)
ps.leaves.each { |fexp| @ruote_engine.re_apply(fexp) }

or (maybe better):

ps = @ruote_engine.ps(exactly_this_process_wfid)
ps.leaves.each { |fexp|
@ruote_engine.re_apply(fexp) unless fexp.is_a?(Ruote::Exp::ParticipantExpression)
}

Where we trust the participant implementations to handle interruptions
gracefully.

(note to self: I should package that routine into

@ruote_engine.zap(wfid)
# or
@ruote_engine.shock(wfid)

it could come in handy)

So, it's rare that a process dies, it's more likely it becomes a vegetable,
but in that case you can shock it to bring its anima back.


I hope it helps, cheers,

mex048

unread,
Mar 30, 2012, 5:56:05 AM3/30/12
to openwfe...@googlegroups.com
thnx, i owe you an hour ;)
brief test of suggested way of action gives OK so far

Reply all
Reply to author
Forward
0 new messages