Hello,
We've been using the latest version of ruote w/ ruote-mon using 2 workers for about 6 months now.
Over time as our process definitions have become more complex and longer running they have also become less reliable.
Many processes are getting 'stuck' - where they never enter the error state and also fail to respond to cancel.
I’ve been Using Ruote-kit to monitor and clean up these processes which usually works.
In the case where a process is 'stuck' and I attempt to kill it, the process changes to the 'dying' state, and never gets removed from the list.
This seems to happen around calls to subprocesses where I attempt to use the ‘pass’ expression for on_error and on_timeout:
cursor :timeout => '${v:timeout}', :on_timeout => :pass, :tag => 'wait_for_fqdn_discovery' do
get_machine_fqdn
sequence :unless => '${f:machine_fqdn}' do
log 'waiting 60s' => '${f:machine.machine_id}'
wait '60s'
rewind
end
end
refresh_state :on_error => 'pass'
sequence :unless => '$f:machine.remote_id' do
2014-02-28 19:20:26 Env: 5310bda1e1d14826e00000bc Thread: 22454140 - Participants::Log: {"CREATING MACHINE"=>{"availability_zone"=>"nova", "flags"=>{"migrate"=>true, "wipe"=>true}, "flavor_id"=>"18", "image_id"=>"de0bf0b8-8f16-4e0e-bc23-cfc27c52283c", "machine_id"=>"5310be96749f06d9ef000051", "name"=>"mlb14-goo-balancer4", "puppet_role"=>"role::sdod::playerconnect", "remote_id"=>"020235ef-9541-46b3-9f05-b1832daf440d", "security_groups"=>["server.balancer"], "services"=>{"playcore"=>{"balancer"=>{}}}, "state"=>"ECO_CREATED", "tenant_name"=>"GPAD_SD1", "user_data"=>"application=mlb14&environment=production pe_eco_environment=ote pe_eco_message_broker=eco-ote-messaging.eco.usw1.cld.scea.com", "_id"=>"5310e13de1d1483dd90000c0"}, "ref"=>"log"}
John
--
--
you received this message because you are subscribed to the "ruote users" group.
to post : send email to openwfe...@googlegroups.com
to unsubscribe : send email to openwferu-use...@googlegroups.com
more options : http://groups.google.com/group/openwferu-users?hl=en
---
You received this message because you are subscribed to a topic in the Google Groups "ruote" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/openwferu-users/rBXaCeTtBig/unsubscribe.
To unsubscribe from this group and all its topics, send an email to openwferu-use...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.