Processes get stuck in dying or cancelling

49 views
Skip to first unread message

R Law

unread,
May 25, 2012, 12:09:44 AM5/25/12
to openwfe...@googlegroups.com
Sometimes I can't kill a process, even when there are no errors and no workitems left.

Here is the json output from Ruote-Kit for one of the stuck processes: http://www.hastebin.com/gosiqosoti.pl

All I know how to do is to remove the process manually from the storage, but is there any way I can prevent this?

John Mettraux

unread,
May 25, 2012, 12:20:45 AM5/25/12
to openwfe...@googlegroups.com
Hello Reed,

what version of Ruby are you using?
What version of ruote are you using?
What's the context in which you're using ruote? (Rails, Sinatra, Passenger,
...)
What storage are you using for ruote?
What's behind the storage? (Database, version)?
What is your operating system? What version?
What does your process definition look like?
What does your "kill" scenario read like?
What more should I know?

Cheers,

--
John Mettraux - http://lambda.io/jmettraux

Reed Law

unread,
Jun 12, 2012, 2:38:36 AM6/12/12
to openwfe...@googlegroups.com

Hello Reed,

what version of Ruby are you using?

Ruby 1.9.3 

What version of ruote are you using?

I'm on the latest Github HEAD (a9d289744cc84c816c66f59ad061f08ba1b8f748) but it's been happening for several months.
 
What's the context in which you're using ruote? (Rails, Sinatra, Passenger,
...)

Rails 3.2.5 with ruote-kit (also latest HEAD)
 
What storage are you using for ruote?

Was using ruote-redis, now ruote-mon
 
What's behind the storage? (Database, version)?

MongoDB 2.0.6
 

What is your operating system? What version?

Ubuntu 12.04
 
What does your process definition look like?

 
What does your "kill" scenario read like?

I am trying to kill or cancel the stuck process through ruote-kit's web interface (http://0.0.0.0:3000/_ruote)
 
What more should I know?

This issue is not limited to my present setup but I've experienced it on each platform I've used. In the fluo representation the stuck processes have a green marker that isn't on any participant but on the edge of an expression. In my app I delete processes with RuoteKit.engine.kill_process(wfid). That method sometimes leaves them in a stuck state. In ruote-kit sometimes killing a process produces errors that, once cleared, allow the process to be deleted. Other times clearing the errors still leaves me with a stuck process.

John Mettraux

unread,
Jun 12, 2012, 5:14:17 AM6/12/12
to openwfe...@googlegroups.com

On Mon, Jun 11, 2012 at 11:38:36PM -0700, Reed Law wrote:
>
> > What version of ruote are you using?
>
> I'm on the latest Github HEAD (a9d289744cc84c816c66f59ad061f08ba1b8f748)
> but it's been happening for several months.

Hello,

ouch, I wished you had reported the issue earlier.

> > What's the context in which you're using ruote? (Rails, Sinatra,
> > Passenger,
> > ...)
>
> Rails 3.2.5 with ruote-kit (also latest HEAD)

You forgot to mention Unicorn (you did in
https://github.com/jmettraux/ruote-redis/issues/3 ;-) )

> > What storage are you using for ruote?
>
> Was using ruote-redis, now ruote-mon

Ouch, according to the issue mentioned above, you switched to ruote-mon
today. Is it happening with ruote-mon too?
If yes if you could respond to the answer at the end of this email (What
stucks... and co)

> > What's behind the storage? (Database, version)?
>
> MongoDB 2.0.6

What Redis was it ?

> > What does your process definition look like?
>
> See http://www.hastebin.com/gosiqosoti.pl

OK, I'll reverse engineer the definition.

Do you have the output for the two expressions in that process?

> > What does your "kill" scenario read like?
>
> I am trying to kill or cancel the stuck process through ruote-kit's web
> interface (http://0.0.0.0:3000/_ruote)

How does a process end up stuck?

> > What more should I know?
>
> This issue is not limited to my present setup but I've experienced it on
> each platform I've used. In the fluo representation the stuck processes
> have a green marker that isn't on any participant but on the edge of an
> expression. In my app I delete processes
> with RuoteKit.engine.kill_process(wfid). That method sometimes leaves them
> in a stuck state. In ruote-kit sometimes killing a process produces errors
> that, once cleared, allow the process to be deleted. Other times clearing
> the errors still leaves me with a stuck process.

I wish I could see one of those errors.

I wish I could see one of those stuck processes.

What stucks the process in the first place?
Is the stucking always happening around the same expression?
What do those "sometimes errors" look like (message, type, backtrace)...


For the redis issue you reported I'll try to come up with a reconnect thing,
stay tuned.


Thanks in advance,

Reed Law

unread,
Jun 12, 2012, 10:07:29 PM6/12/12
to openwfe...@googlegroups.com
Hello,

ouch, I wished you had reported the issue earlier.


Sorry, I thought maybe I wasn't using Ruote correctly.
 
> > What's behind the storage? (Database, version)?
>
> MongoDB 2.0.6

What Redis was it ?

It's version 2.3.9. While I was waiting on you for the redis reconnect code, I went ahead and tested with ruote-mon. It seems to be working fine so far.
 

> > What does your process definition look like?
>
> See http://www.hastebin.com/gosiqosoti.pl

OK, I'll reverse engineer the definition.

Do you have the output for the two expressions in that process?

Not sure what you mean by output.
 
I wish I could see one of those errors.

I wish I could see one of those stuck processes.

What stucks the process in the first place?
Is the stucking always happening around the same expression?
What do those "sometimes errors" look like (message, type, backtrace)...


I just tried killing 7 processes on my dev machine and couldn't get the errors to reoccur. With ruote-mon I was getting the errors at first with the connection string set up as in the readme on Github. Later I switched to sharing the Mongoid connection to MongoDB like so:

RuoteKit.engine = Ruote::Engine.new(
  Ruote::Worker.new(
    Ruote::Mon::Storage.new(
      Mongoid.database,
      {})))

This was the only way I could get it to work in production, because our production db requires authentication. When I tried authenticating like so:

config = YAML.load_file(Rails.root + 'config' + 'mongoid.yml')[Rails.env]

RuoteKit.engine = Ruote::Engine.new(
  Ruote::Worker.new(
    Ruote::Mon::Storage.new(
      Mongo::Connection.new(config["host"], config["port"])[config["database"]].authenticate(config["username"], config["password"]),
      {})))

I was getting connection errors.

But today I tried killing processes again and none of them got stuck. I will try to reproduce the earlier errors for you when I get a chance.


Reply all
Reply to author
Forward
0 new messages