performance question

48 views
Skip to first unread message

Suhrawardi

unread,
Jun 17, 2013, 9:11:16 AM6/17/13
to openwfe...@googlegroups.com
Hi,

We are building a workflow system for a process that queries some other sytems (DB's and via http) and use ruote-redis for storage. When I start a single workflow process, the whole process is finished within 10 seconds. But when starting more processes (15) at the same time, the processing time increases to 30 seconds. When we start even more, we see that eventually all the workflow processes get stuck.

To make sure that the database or http calls are not the problem, I removed all these calls, and replaced the code in the :on_workitem calls with a dummy code that adds a dummy value to the workitem and does a reply. That speeds up the workflow process very little to 20 seconds, while a single workflow at a time with db and http calls is finished in 10 seconds.

This seems similar to the problem described in this https://groups.google.com/forum/#!topic/openwferu-users/0shOiU7ab8M post.

We didn't manage to speed up the process with more instances of the workflow application (we use AMQP to send the messages, so we can easily add more instances), or by running multiple workers of engines within a single instance of the workflow application.
Switching to the FS Storage didn't solve the problem either.

So far, we didn't have any luck solving this problem. Do you have any ideas what the best approach would be to pinpoint this problem?
Any ideas where to look?

We use ruby-2.0.0-p195, ruote (2.3.0.2), ruote-redis (2.3.0).

Thanks a lot,
Jarra


John Mettraux

unread,
Jun 17, 2013, 8:46:50 PM6/17/13
to openwfe...@googlegroups.com

On Mon, Jun 17, 2013 at 06:11:16AM -0700, Suhrawardi wrote:
> Hi,

Hello Jarra,

welcome to the ruote mailing list.

Thanks for the excellent analysis work and issue report, very much appreciated.

Sorry for the problems.


> We are building a workflow system for a process that queries some other
> sytems (DB's and via http) and use ruote-redis for storage. When I start a
> single workflow process, the whole process is finished within 10 seconds.
> But when starting more processes (15) at the same time, the processing time
> increases to 30 seconds. When we start even more, we see that eventually
> all the workflow processes get stuck.
>
> To make sure that the database or http calls are not the problem, I removed
> all these calls, and replaced the code in the :on_workitem calls with a
> dummy code that adds a dummy value to the workitem and does a reply. That
> speeds up the workflow process very little to 20 seconds, while a single
> workflow at a time with db and http calls is finished in 10 seconds.

Well done.


> This seems similar to the problem described in
> this https://groups.google.com/forum/#!topic/openwferu-users/0shOiU7ab8M
> post.
>
> We didn't manage to speed up the process with more instances of the
> workflow application (we use AMQP to send the messages, so we can easily
> add more instances), or by running multiple workers of engines within a
> single instance of the workflow application.

Multiple worker threads, one GIL. Independent Ruby processes workers should
be better.


> Switching to the FS Storage didn't solve the problem either.

Do you mean that when launching simulaneously 20 ruote processes (one ruote
worker), the system jams (Redis and Fs too) ?


> So far, we didn't have any luck solving this problem. Do you have any ideas
> what the best approach would be to pinpoint this problem?
> Any ideas where to look?
>
> We use ruby-2.0.0-p195, ruote (2.3.0.2), ruote-redis (2.3.0).

If I understand correcly, it's purely ruote's fault.

May I see what the process looks like? I'd like to try myself with the Redis
storage and the fs storage.

I tend to run with noisy on, and/or, since it's Ruby code, I'd do "bundle
open ruote" and add debug output to the #step method in the lib/ruote/worker
to see what is happening.

I'd really like to see the process to see what kind of stress (indenpendtly
of participant implementations) is put on the ruote worker and the storages.


Thanks in advance, sorry for the nuisance,

--
John Mettraux - http://lambda.io/jmettraux

Suhrawardi

unread,
Jun 18, 2013, 6:00:36 AM6/18/13
to openwfe...@googlegroups.com
Hi John,

Thanks for your prompt reply. No problem... :-)

I made a very minimal version of the workflow application I'm currently working on, with all the participants stubbed out: They simply return an ok.
I put this application on Github, so you can try it out for yourself:

You can run the bin/test.rb script to start the workflow (see the README.md). The first command line param is the number of workflows that is started.
In the bin/app.rb file, you can change the nr of workflow engines that you want to start to handle the requests (within the same process). In order to handle the requests with multiple processes, just start bin/app.rb a few times.

For me, there is not much difference whether I run the application with multiple engines, a single engine, or handled by multiple processes.

Besides that, there is no gain when I switch to another storage implementation (which you can change in lib/yaap/engine.rb).

Mind that there is a 'progress' key in the workitem hash that is dumped by the test script. It is really weird, but some steps seem to be started and finished more then once... While that should not be the case (workflow is in participants/yaap_mobile_auditor/workflows/main.conf.rb).

Hope you can help us pinpoint where the problem is. If you have any questions, or like me to try something else, please let me know! :-)

Thanks a lot so far!

Jarra

John Mettraux

unread,
Jun 19, 2013, 12:00:56 AM6/19/13
to openwfe...@googlegroups.com

On Tue, Jun 18, 2013 at 03:00:36AM -0700, Suhrawardi wrote:
>
> I made a very minimal version of the workflow application I'm currently
> working on, with all the participants stubbed out: They simply return an ok.
> I put this application on Github, so you can try it out for yourself:
> https://github.com/suhrawardi/ruote_poc
>
> You can run the bin/test.rb script to start the workflow (see the
> README.md). The first command line param is the number of workflows that is
> started.
> In the bin/app.rb file, you can change the nr of workflow engines that you
> want to start to handle the requests (within the same process). In order to
> handle the requests with multiple processes, just start bin/app.rb a few
> times.

Hello Jarra,

sorry, I will not use your "poc", I don't want to have Bunny/AMQP putting
sticks in my wheels. I will try to build an example that really states "hey
it's ruote's fault".

In your "poc" readme, you don't mention ruote grinding to a halt.
The perfomance section doesn't tell how many ruote workers you are running
and which storage is used for the result.

> For me, there is not much difference whether I run the application with
> multiple engines, a single engine, or handled by multiple processes.

Multiple workers?

> Besides that, there is no gain when I switch to another storage
> implementation (which you can change in lib/yaap/engine.rb).
>
> Mind that there is a 'progress' key in the workitem hash that is dumped by
> the test script. It is really weird, but some steps seem to be started and
> finished more then once... While that should not be the case (workflow is
> in participants/yaap_mobile_auditor/workflows/main.conf.rb).

That's an interesting piece of feedback.

Your whole example is superbly packaged, but I find it confusing that a
workflow definition is places under participants/.

In your tests, the concurrent_iterator after "step5", how many branches does
it "spawn"?

> Hope you can help us pinpoint where the problem is. If you have any
> questions, or like me to try something else, please let me know! :-)

OK, I will try building my own load bench around your process definition.


Thanks in advance for the answer about the width of the concurrent-iterator,

Suhrawardi

unread,
Jun 19, 2013, 2:04:05 AM6/19/13
to openwfe...@googlegroups.com
Hi John,

Thanks again for you reply!

sorry, I will not use your "poc", I don't want to have Bunny/AMQP putting
sticks in my wheels. I will try to build an example that really states "hey
it's ruote's fault".

Ok, fair enough...
 
In your "poc" readme, you don't mention ruote grinding to a halt.

I don't know what exactly causes our workflow application grinding to a halt. In my minimal ruote poc it is not happening. Thus it needs further investigation from my part. It seems to relate to failing participants / participants that raise an error in the concurrence block. But only when running a lot of flows simultaneously. It does not happen when I run 5 flows. Those errors are raised when the expected information is not found in other systems (or these systems are simply not available). I will investigate that further and let you know.
 
The perfomance section doesn't tell how many ruote workers you are running
and which storage is used for the result.

The benchmarks mentioned in the README.md file are all done with a single engine, single worker, and Redis storage. I tried FsStorage, but that was even slower.
 
> For me, there is not much difference whether I run the application with
> multiple engines, a single engine, or handled by multiple processes.

Multiple workers?

I made an EngineDistributor (it's in lib/yaap/engine_distributor.rb) that starts 1 or more workflow engines, each with their own database.
As there is nothing to share between different workflows, that seemed like a good idea to speed thing up to me. But that does not make a difference.
That is wat I mean with multiple engines (and == multiple workers). I could not see the benefits for using EngineParticipants to use multiple workers in our case.

Because we use AMQP, we can easily start more then one process (bin/app.rb) to handle the workflow requests that come in. The load will then be divided by the different processes that are running. That's what I mean with multiple processes.
 
> Mind that there is a 'progress' key in the workitem hash that is dumped by
> the test script. It is really weird, but some steps seem to be started and
> finished more then once... While that should not be the case (workflow is
> in participants/yaap_mobile_auditor/workflows/main.conf.rb).

That's an interesting piece of feedback.

However, I have the feeling that it is caused by the :merge_type => 'concat'
 
Your whole example is superbly packaged, but I find it confusing that a
workflow definition is places under participants/.

I agree... :-) That's because of historical reasons. But rest assured, it's like that only while we are working on it. The idea is to have different workflows, each of which resembles in it's own gem. That makes it very easy to update a single workflow.
 
In your tests, the concurrent_iterator after "step5", how many branches does
it "spawn"?
 
In the example app, it does not spawn, so step 6 is never executed. In the application we are building, it is running once in 85% of the cases, and 2 or maybe max 3 times in the rest of the cases.

In our case, we would like to start a lot of workflows at the same time (100 / 200) and have them run all at the same time (as much as possible) and return asap. We do a lot of IO stuff, databases / http calls etc, so that should speed things up.
But I have a feeling that when I start 200 workflows with the example poc application, the first step is started 200 times. When that first step is finished for all of them, it looks like it is waiting for 10 seconds, and only then continues with the second step.
It looks like it is doing reply and apply actions for 10 seconds, before it starts with the second step.
The problem is still there when I remove the concurrence.

Hope this provides some more information.
Thanks a lot for the good work and all the help!! :-)

Jarra


John Mettraux

unread,
Jun 19, 2013, 2:28:24 AM6/19/13
to openwfe...@googlegroups.com

On Tue, Jun 18, 2013 at 11:04:05PM -0700, Suhrawardi wrote:
>
> (...)
>
> I don't know what exactly causes our workflow application grinding to a
> halt. In my minimal ruote poc it is not happening. Thus it needs further
> investigation from my part. It seems to relate to failing participants /
> participants that raise an error in the concurrence block. But only when
> running a lot of flows simultaneously. It does not happen when I run 5
> flows. Those errors are raised when the expected information is not found
> in other systems (or these systems are simply not available). I will
> investigate that further and let you know.
>
>
> > The perfomance section doesn't tell how many ruote workers you are running
> > and which storage is used for the result.
> >
>
> The benchmarks mentioned in the README.md file are all done with a single
> engine, single worker, and Redis storage. I tried FsStorage, but that was
> even slower.

Hello Jarra,

understood, yes, the fs storage should be slower than the Redis storage.

> > > For me, there is not much difference whether I run the application with
> > > multiple engines, a single engine, or handled by multiple processes.
> >
> > Multiple workers?
>
> I made an EngineDistributor (it's in lib/yaap/engine_distributor.rb) that
> starts 1 or more workflow engines, each with their own database.
> As there is nothing to share between different workflows, that seemed like
> a good idea to speed thing up to me. But that does not make a difference.
> That is wat I mean with multiple engines (and == multiple workers). I could
> not see the benefits for using EngineParticipants to use multiple workers
> in our case.

OK, understood.

> Because we use AMQP, we can easily start more then one process (bin/app.rb)
> to handle the workflow requests that come in. The load will then be divided
> by the different processes that are running. That's what I mean with
> multiple processes.

So you want to distribute the work among "engines" and not among "workers of
a single engine".

> > > Mind that there is a 'progress' key in the workitem hash that is dumped
> > by
> > > the test script. It is really weird, but some steps seem to be started
> > and
> > > finished more then once... While that should not be the case (workflow
> > is
> > > in participants/yaap_mobile_auditor/workflows/main.conf.rb).
> >
> > That's an interesting piece of feedback.
> >
>
> However, I have the feeling that it is caused by the :merge_type => 'concat'

I couldn't see any problems (started and finished more than once) in my test
bench (see below).

> (...)
>
>
> > In your tests, the concurrent_iterator after "step5", how many branches
> > does
> > it "spawn"?
> >
>
> In the example app, it does not spawn, so step 6 is never executed. In the
> application we are building, it is running once in 85% of the cases, and 2
> or maybe max 3 times in the rest of the cases.

In my test bench, I went for 4 branches.

> In our case, we would like to start a lot of workflows at the same time
> (100 / 200) and have them run all at the same time (as much as possible)
> and return asap. We do a lot of IO stuff, databases / http calls etc, so
> that should speed things up.
> But I have a feeling that when I start 200 workflows with the example poc
> application, the first step is started 200 times. When that first step is
> finished for all of them, it looks like it is waiting for 10 seconds, and
> only then continues with the second step.

Yes, ruote 2.0 version will do that exactly, it will do the 200 first steps
and then the next steps will all quick in. If there are multiple workers (and
slow/human participants afterwards, that changes...)

> It looks like it is doing reply and apply actions for 10 seconds, before it
> starts with the second step.
> The problem is still there when I remove the concurrence.

It's not a problem, it's by design like that.


Here is my test bench:

https://github.com/jmettraux/ruote_poc

Here are the stats for one set of runs:

https://github.com/jmettraux/ruote_poc/blob/master/results/stats.txt

At no point does the engine jam. I see no "steps start multiple times and
doesn't finish" issues.

For ruote 3.0, I'd love to go with a worker processing as much of a workflow
as it can before handing it back. Ruote-asw works like that but it's
unfinished. I don't have much time to work on open source these days...


If you could pinpoint what goes wrong with your participants.

I strongly think that your "multiple engines" option should perform better
than "multiple workers". I don't know what your participants look like, I
have the impression they are accessing the same set of resources and that may
induce a synchronization cost...


Hope this can help, thanks for the feedback and the nice test bench,

Suhrawardi

unread,
Jun 19, 2013, 3:33:44 PM6/19/13
to openwfe...@googlegroups.com
Hi John,

Thanks a lot for your reply.

> Because we use AMQP, we can easily start more then one process (bin/app.rb)
> to handle the workflow requests that come in. The load will then be divided
> by the different processes that are running. That's what I mean with
> multiple processes.

So you want to distribute the work among "engines" and not among "workers of
a single engine".

I think that distributing the work among different engines should perform better in our case.

> In our case, we would like to start a lot of workflows at the same time
> (100 / 200) and have them run all at the same time (as much as possible)
> and return asap. We do a lot of IO stuff, databases / http calls etc, so
> that should speed things up.
> But I have a feeling that when I start 200 workflows with the example poc
> application, the first step is started 200 times. When that first step is
> finished for all of them, it looks like it is waiting for 10 seconds, and
> only then continues with the second step.

Yes, ruote 2.0 version will do that exactly, it will do the 200 first steps
and then the next steps will all quick in. If there are multiple workers (and
slow/human participants afterwards, that changes...)

So, if I understand you correctly: When I start 50 workflow processes at the same time, for all of them the first participant is handled. When all replied, only then the second participant is handled? That means that if we have one of these 50 workflows that takes a long time (eg gets a timeout because the required resource is unavailable), this will block all the 50 other workflow processes?

Hmm, in our case that is not an option... :-) We'll have to avoid that. But using different engines, that should not be a problem.
 
Here is my test bench:

  https://github.com/jmettraux/ruote_poc

Here are the stats for one set of runs:

  https://github.com/jmettraux/ruote_poc/blob/master/results/stats.txt

Thanks a lot for your take on our poc. It is very insightful and helpful.
We saw you used Yajl. When we changed to Yajl, that halved the processing time... :-)

At no point does the engine jam. I see no "steps start multiple times and
doesn't finish" issues.
 
We're investigating further. And it looks like it's caused by Bunny losing messages under high load.

So, still some more investigation to do. But you helped us a lot!
Thanks a lot!

Jarra

John Mettraux

unread,
Jun 19, 2013, 11:52:12 PM6/19/13
to openwfe...@googlegroups.com

On Wed, Jun 19, 2013 at 12:33:44PM -0700, Suhrawardi wrote:
>
> (...)
>
> > In our case, we would like to start a lot of workflows at the same time
> > > (100 / 200) and have them run all at the same time (as much as possible)
> > > and return asap. We do a lot of IO stuff, databases / http calls etc, so
> > > that should speed things up.
> > > But I have a feeling that when I start 200 workflows with the example
> > poc
> > > application, the first step is started 200 times. When that first step
> > is
> > > finished for all of them, it looks like it is waiting for 10 seconds,
> > and
> > > only then continues with the second step.
> >
> > Yes, ruote 2.0 version will do that exactly, it will do the 200 first
> > steps
> > and then the next steps will all quick in. If there are multiple workers
> > (and
> > slow/human participants afterwards, that changes...)
> >
>
> So, if I understand you correctly: When I start 50 workflow processes at
> the same time, for all of them the first participant is handled. When all
> replied, only then the second participant is handled? That means that if we
> have one of these 50 workflows that takes a long time (eg gets a timeout
> because the required resource is unavailable), this will block all the 50
> other workflow processes?
>
> Hmm, in our case that is not an option... :-) We'll have to avoid that. But
> using different engines, that should not be a problem.

Hello,

sorry, I didn't explain it right.

(speaking of ruote 2.x) When you start 50 workflow processes together, the
first node of their expression tree is handled, 50 in a row. As soon as a
participant node is hit, a workitem is dispatched to the participant, when
the reply comes back to the engine, the reply is scheduled for execution. The
workers pick the messages as they come, so when a participant takes some
time, its answer is delayed and work for other processes get executed
beforehand.

Ruote workers are very dumb, they just pick up available work and perform.

What you see is 50 messages getting processed one after the other. If you
look carefully at the output of the workers from my ruote_poc, you'll notice
that even if it looks like a pack of 50 sheeps advancing on one front at the
beginning, it then trickles and then some processes are processed faster. The
participant in the ruote_poc are very simple, but they execute each in its
own Ruby thread (this default behaviour can be changed), so as the executions
move on, the processes whose successive participants executed faster (the
whim of the thread scheduler) finish faster. It can be thought to simulate
somehow participants calling out to external services.

Branches of workflows calling out to participants place themselves in wait
for the participant answer. That waiting time may vary.

For ruote 3.0, as I did for some storages and was demanded/suggested by some
people in this list, I'd like to try to keep a worker processing the same
workflow instance until it hits a participant. At that point it'd schedule
the participant work and yield, probably considering other workflow
instances. That would alter the "50 sheeps walking on single front" effect
you've been observing *right after the launch*.

Wow, that is long.

If you want to observe the work in each worker of my ruote_poc:

https://github.com/jmettraux/ruote_poc/commit/d822e5a65f39


> (...)


All the best, kind regards,

Suhrawardi

unread,
Jun 20, 2013, 12:20:28 AM6/20/13
to openwfe...@googlegroups.com
Hi John,

(speaking of ruote 2.x) When you start 50 workflow processes together, the
first node of their expression tree is handled, 50 in a row. As soon as a
participant node is hit, a workitem is dispatched to the participant, when
the reply comes back to the engine, the reply is scheduled for execution. The
workers pick the messages as they come, so when a participant takes some
time, its answer is delayed and work for other processes get executed
beforehand.

Ruote workers are very dumb, they just pick up available work and perform.

What you see is 50 messages getting processed one after the other. If you
look carefully at the output of the workers from my ruote_poc, you'll notice
that even if it looks like a pack of 50 sheeps advancing on one front at the
beginning, it then trickles and then some processes are processed faster. The
participant in the ruote_poc are very simple, but they execute each in its
own Ruby thread (this default behaviour can be changed), so as the executions
move on, the processes whose successive participants executed faster (the
whim of the thread scheduler) finish faster. It can be thought to simulate
somehow participants calling out to external services.

Branches of workflows calling out to participants place themselves in wait
for the participant answer. That waiting time may vary.

Ah, that indeed resembles more the behaviour we saw. When 50 participants finish right after each other, Ruote is doing housekeeping for all those processes first and only afterwards finds the first participants to be executed on the queue. In our first poc that explains the 10 seconds that passed before Ruote continued.
This was reduced to 5 seconds when we switched to Yajl.
 
For ruote 3.0, as I did for some storages and was demanded/suggested by some
people in this list, I'd like to try to keep a worker processing the same
workflow instance until it hits a participant. At that point it'd schedule
the participant work and yield, probably considering other workflow
instances. That would alter the "50 sheeps walking on single front" effect
you've been observing *right after the launch*.

In our case that would be more desirable.

Thanks a lot for the thorough explanation!

Best,
Jarra 
Reply all
Reply to author
Forward
0 new messages