Ruote consuming 100% CPU

72 views
Skip to first unread message

Dan Kilman

unread,
Apr 22, 2014, 5:20:16 AM4/22/14
to openwfe...@googlegroups.com
Hi,
I am using ruote as the workflow engine for cloudify, the opensource cloud orchestration system (https://github.com/cloudify-cosmo/cloudify-manager)
We have recently started using ruote to provision larger pools of cloud resources and noticed that ruote is using up more and more of the available CPUs to the point where the host maxed out on ruote processing for several minutes (note that the entire execution is local and no long running tasks are executed as part of this workflow in our testing). You can see workflow in the following link
Note that the number of node elements is approx 100 node elements.
Is there something we are doing wrong with this workflow?  
I would greatly appreciate any help.
Thanks, Dan.

John Mettraux

unread,
Apr 22, 2014, 5:53:38 AM4/22/14
to openwfe...@googlegroups.com
On Tue, Apr 22, 2014 at 02:20:16AM -0700, Dan Kilman wrote:
>
> I am using ruote as the workflow engine for cloudify, the opensource cloud
> orchestration system (https://github.com/cloudify-cosmo/cloudify-manager)
> We have recently started using ruote to provision larger pools of cloud
> (...)
> Is there something we are doing wrong with this workflow?
> I would greatly appreciate any help.

Hello Dan,

sorry, but did you notice that ruote is not maintained anymore?

https://github.com/jmettraux/ruote
https://groups.google.com/forum/#!topic/openwferu-users/g0jZuWeoXOA

Best regards,

John

Dan Kilman

unread,
Apr 22, 2014, 6:09:19 AM4/22/14
to openwfe...@googlegroups.com
To my great sorrow, I have noticed this and we do plan on making a transition in the future but for now ruote serves as integral part of our architecture so we're going to stick with it for now.

Dan Kilman

unread,
Apr 22, 2014, 6:48:23 AM4/22/14
to openwfe...@googlegroups.com
To expand a little.
I would greatly appreciate it if you could still have a small look at the radial link I've attached to see if there is anything blatantly wrong with it. We have put a lot of effort on our side trying to optimize the workflow, but these efforts did not produce sufficient results yet.

John Mettraux

unread,
Apr 22, 2014, 6:59:59 AM4/22/14
to openwfe...@googlegroups.com

On Tue, Apr 22, 2014 at 03:48:23AM -0700, Dan Kilman wrote:
>
> To expand a little.
> I would greatly appreciate it if you could still have a small look at the
> radial link I've attached to see if there is anything blatantly wrong with
> it. We have put a lot of effort on our side trying to optimize the
> workflow, but these efforts did not produce sufficient results yet.

Hello,

I see nothing wrong with your process definition. It seems a bit convoluted
with lots of set/unset and I don't understand the "state" participant.

I have to say that you are not putting much effort into issue reporting. So
many things to guess on our side.

This is a must read by the way:
http://www.chiark.greenend.org.uk/~sgtatham/bugs.html

I admire you putting a lot of effort into optimizing your workflow, but
I cannot help notice that nowhere in "the opensource cloud orchestration
system"text is there any "thanks to the ruote guys for the years of effort
they put into their tool" (and ruote isn't the only tool you're using).

Kind regards,

John

Barak Merimovich

unread,
Apr 23, 2014, 7:53:58 AM4/23/14
to openwfe...@googlegroups.com
Hi,

My name is Barak and I work with Dan on the Cloudify project. 
I just wanted to clarify a few things.

Cloudify is in the process of a major redesign effort. As part of this redesign we are replacing our existing infrastructure with best-of-breed technologies, and we selected Ruote as our workflow engine. At the time, Ruote was still actively developed and it seemed a good choice for our requirements. The web site and documentation for the new release of Cloudify are not online yet, so any Cloudify related materials you see on the web relate to previous releases.

As we approach the release of this new version, we encountered the issues Dan described above, and are looking for someone to consult us about resolving these issues. We have a standalone test which shows the maxed out CPU described previously, and will upload it shortly. We would appreciate any type of support you can provide.

Thanks,
Barak

Barak Merimovich

unread,
Apr 23, 2014, 8:14:41 AM4/23/14
to openwfe...@googlegroups.com
Following Dan's note above, there is a test case available here:

Would it be possible to discuss this over IRC?

Regards,
Barak

John Mettraux

unread,
Apr 23, 2014, 5:05:25 PM4/23/14
to openwfe...@googlegroups.com

On Wed, Apr 23, 2014 at 04:53:58AM -0700, Barak Merimovich wrote:
>
> My name is Barak and I work with Dan on the Cloudify project.
> I just wanted to clarify a few things.
>
> Cloudify is in the process of a major redesign effort. As part of this
> redesign we are replacing our existing infrastructure with best-of-breed
> technologies, and we selected Ruote as our workflow engine. At the time,
> Ruote was still actively developed and it seemed a good choice for our
> requirements.

Hello Barak,

sorry, when was that? I announced the "end" of ruote in November 2013... Half
a year ago.

> The web site and documentation for the new release of
> Cloudify are not online yet, so any Cloudify related materials you see on
> the web relate to previous releases.
>
> As we approach the release of this new version, we encountered the issues
> Dan described above, and are looking for someone to consult us about
> resolving these issues. We have a standalone test which shows the maxed out
> CPU described previously, and will upload it shortly. We would appreciate
> any type of support you can provide.

OK.

Best regards,

John

John Mettraux

unread,
Apr 23, 2014, 5:06:45 PM4/23/14
to openwfe...@googlegroups.com

On Wed, Apr 23, 2014 at 05:14:41AM -0700, Barak Merimovich wrote:
>
> Following Dan's note above, there is a test case available here:
> https://github.com/dankilman/mock-workflow-service

OK, thanks.

> Would it be possible to discuss this over IRC?

OK, I'm GMT+9.


Best regards,

John

John Mettraux

unread,
Apr 23, 2014, 9:27:24 PM4/23/14
to openwferu-users
Hello,

I noticed you are using the merge-type="ignore" which is good. I guess
that, at 100 iterations the concurrent iterator is not performing as
you expect. Unfortunately, ruote was not designed for such "wide"
c-iterations. There is an emphasis on holding to the data. Others have
been bit by this in the past, with some searching, the list might
yield those conversations.

I know it's not the same thing, but have you thought about firing a
workflow per node instead of the current approach?

Also, reading diagonally, I see that you have an emphasis on "state".
Wouldn't you guys be better served by a state machine library? I know
you have "orchestration" in your product name, I understand you want
workflow ultimately.

Best regards,

John

Dan Kilman

unread,
Apr 24, 2014, 5:02:04 AM4/24/14
to openwfe...@googlegroups.com
Following your advice, we are going to try the following: 
1. execute a workflow for each node 
2. try and minimize the workitem to the bear minimum and fetch data only on demand. 

Did you happen to run the test case we supplied? Is there anything you can suggest to futher investigate? 
We know about noisy mode but we are heaving some hard time interpreting what is going on as there are many subprocesses and many different branches and its very hard to follow.

John Mettraux

unread,
Apr 24, 2014, 6:26:32 AM4/24/14
to openwfe...@googlegroups.com

On Thu, Apr 24, 2014 at 02:02:04AM -0700, Dan Kilman wrote:
>
> Following your advice, we are going to try the following:
> 1. execute a workflow for each node
>
> 2. try and minimize the workitem to the bear minimum and fetch data only on
> demand.

Hello,

unfortunately, about this point 2, I think you already did for the best by
using merge-type="ignore". Of course, if the workitem is super fat, it'll
save time to shrink it.

> Did you happen to run the test case we supplied? Is there anything you can
> suggest to futher investigate?

No, I'm sorry, we're in the middle of a "going live with the product" session
in the dayjob. These days I only have 1 hour that I can dedicate to open
source work per day.

> We know about noisy mode but we are heaving some hard time interpreting
> what is going on as there are many subprocesses and many different branches
> and its very hard to follow.

I don't think noisy mode we'll help. As said in my previous email, ruote was
not meant to go "wide" like this. It simply does its job and that's too slow
for you. No bug it seems, just a bad performance surprise.

That's a reason why I don't dive into running your example.

Best regards,

John

Dan Kilman

unread,
Apr 24, 2014, 6:44:34 AM4/24/14
to openwfe...@googlegroups.com
I really do appreciate your help so far, so thanks again for clearing things up.
One question to help clarifying things. Why are we seeing a non linear growth in execution time with these "wide" concurrent iterators?
Do you think splitting these into smaller batches can make a difference?

John Mettraux

unread,
Apr 24, 2014, 8:35:04 AM4/24/14
to openwfe...@googlegroups.com

On Thu, Apr 24, 2014 at 03:44:34AM -0700, Dan Kilman wrote:
>
> (...)
>
> One question to help clarifying things. Why are we seeing a non linear
> growth in execution time with these "wide" concurrent iterators?

Hello Dan,

I can't remember exactly. It's mostly a matter of "do a small amount of work
then save the state" compounded in the concurrent-iterator expression.

> Do you think splitting these into smaller batches can make a difference?

Yes, it could.

You have a test bench now, you can experiment.


Best regards,

John

Reply all
Reply to author
Forward
0 new messages