[akka-user] Making workflow graphs from Akka Actors?

1,161 views
Skip to first unread message

Alex Averbuch

unread,
Nov 9, 2010, 12:17:10 PM11/9/10
to akka...@googlegroups.com
Hey,
I have a question regarding the suitability of Akka for a certain application: Workflows.

Assume you have a number of "tasks" that need to be performed, where the input of some tasks is the output of other tasks, and that these tasks can be "piped" together to form a "workflow graph" of tasks. Ultimately, the execution of this workflow graph results in a workflow being performed.

For example, a number of filtering tasks may be chained together to filter a stream of incoming data.

I'm working with such a library at the moment, but it's single threaded. Basically, at the moment tasks expose iterators that get chained together and then executed. 
An attempt to parallelize this library was made in the past, but it assigned each task to a thread, which resulted in poor performance when the workflow jobs were short. I guess, due to the cost of creating the threads.

What I'd like to do is assign tasks to Akka actors and have them connect via message passing, rather than via iterators (shared state).

My questions are:
  • Can Akka efficiently stream a reasonably large number of messages between actors? (as fast/efficient as connecting tasks via iterators)
  • How much faster is it to create Akka actors vs creating Java Threads?
  • How much flexibility/control does Akka provide with respect to mapping actors to threads? For example, would it give me the ability to say: "actors A & B may be executed in different threads (unit of concurrency), but actors C & D must be executed within the same thread (unit of concurrency)"?
  • Do you think this problem is well suited to Akka?
If you'd like to know anything else about my use case please ask.

Thanks in advance for all help!

Regards,
Alex

Paul Pacheco

unread,
Nov 9, 2010, 5:58:01 PM11/9/10
to akka...@googlegroups.com
I would like to throw an idea on this:

If akka actors had "outboxes", then using something like camel or spring integration you could connect the outbox of an actor to the inbox of another actor on a very loosely coupled way.



--
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To post to this group, send email to akka...@googlegroups.com.
To unsubscribe from this group, send email to akka-user+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/akka-user?hl=en.

Peter Veentjer

unread,
Nov 9, 2010, 6:20:37 PM11/9/10
to akka...@googlegroups.com
On Tue, Nov 9, 2010 at 6:17 PM, Alex Averbuch <alex.a...@gmail.com> wrote:
Hey,
I have a question regarding the suitability of Akka for a certain application: Workflows.

Assume you have a number of "tasks" that need to be performed, where the input of some tasks is the output of other tasks, and that these tasks can be "piped" together to form a "workflow graph" of tasks. Ultimately, the execution of this workflow graph results in a workflow being performed.

For example, a number of filtering tasks may be chained together to filter a stream of incoming data.

I'm working with such a library at the moment, but it's single threaded. Basically, at the moment tasks expose iterators that get chained together and then executed. 
An attempt to parallelize this library was made in the past, but it assigned each task to a thread, which resulted in poor performance when the workflow jobs were short. I guess, due to the cost of creating the threads.

What I'd like to do is assign tasks to Akka actors and have them connect via message passing, rather than via iterators (shared state).

My questions are:
  • Can Akka efficiently stream a reasonably large number of messages between actors? (as fast/efficient as connecting tasks via iterators)
It all depends on the definition of efficient. Akka provides a general purpose actor framework and every form of framework/abstraction adds overhead. If you really want to squeeze the most of out performance, doing it yourself in most cases is the only way. But on the other side, it also consumes a lot of effort en perhaps the performance provided by a general purpose framework is more than good enough.
 
  • How much faster is it to create Akka actors vs creating Java Threads?
Creating a thread for every task is a big nono... but if you can pool the threads (or let the actor framework do it) you can squeeze much more out of performance...  But like I already said, nothing will beat performance on a hand crafted solution.. but be prepared to spend a serious amount of time on tuning (big tip: for extreme high performance.. e.g. millions of operations a second.. gc is performance killer number 1).

 
  • How much flexibility/control does Akka provide with respect to mapping actors to threads? For example, would it give me the ability to say: "actors A & B may be executed in different threads (unit of concurrency), but actors C & D must be executed within the same thread (unit of concurrency)"?

I think you need to have a look at the dispatcher. They provide the ability to customize the threading behavior. And if you really need something special not provided by the current dispatchers, you can always write your own.
 
  • Do you think this problem is well suited to Akka?
It all depends on the performance criteria. Like I already said.. if you really want to squeeze out the most of performance, writing it yourself to fit your needs is the only way to go. But I can also tell you that it is going to consume a lot of time (I'm working for 2 years on the stm implementation used by akka and spend most of my time either on testing or performance turning.. writing features only is a small percentage of the total effort spend). So what are your criteria and check if Akka can meet these criteria is the first thing I would have a look at (e.g. by creating some poc).
 
If you'd like to know anything else about my use case please ask.

Thanks in advance for all help!

Regards,
Alex

--

Peter Veentjer

unread,
Nov 9, 2010, 6:26:31 PM11/9/10
to akka...@googlegroups.com
PS:

You could have a look at the 'processors' functionality provided by an old project I did (before I knew anything about actors).

http://prometheus.codehaus.org/guide-processors.html

The idea was to glue processes together to create complex workflows. I think it in the same direction you are going.

Peter Vlugter

unread,
Nov 9, 2010, 10:17:27 PM11/9/10
to akka...@googlegroups.com
Hi Alex,

Something that comes to mind and would be worth a look is the new task engine in the development version of sbt. Some details can be found here:

http://code.google.com/p/simple-build-tool/wiki/090p2tour

- Peter

Alex Averbuch

unread,
Nov 10, 2010, 3:14:51 AM11/10/10
to akka...@googlegroups.com
Thanks for the comments so far.

Peter,
Thanks for the suggestions. 
Regarding Prometheus, it seems like it's not under active development, and would be difficult to get support for. Also, I'm not sure how stable I can expect it to be. Am I right in these assumptions?
Regarding SBT, this is Scala only right? I'd prefer to work with Java, as the current code base is in Java. I know Scala/Java interop is good, but it complicates the code base and would require that the project maintainers (not me) are capable Scala developers.

To be a bit more clear...
  • I am not the project owner.
  • I would be able to spend 6 months on this project, but not more. Additionally, some of those 6 months will be used to evaluate the resulting code. For this reason I don't want to hand craft/tune/debug something, if a "good enough" solution already exists.
  • This would be a proof of concept to see how much performance gain (if any) can be achieved, over the sequential version.
  • The current code base is in Java, so I would like to stick to Java: For simplicity on my part, for maintainability on the part of the project owner, and because I'm still very new to Scala. Akka is appealing because if has both Java and Scala APIs.
  • The eventual goal is for this library to not only be parallel, but also distributed (not completely unlike Microsoft Dryad). So messages/streams (vs shared state) will eventually be a necessity. For that reason, I'd like to go in that direction from the beginning.
Given these points, is there a reason I shouldn't use Akka?
Also, are any of the Akka add-on modules well suited to this use case?

I'm still deciding whether or not to take this project on. I'd really like to, but I want to get a feeling of what it involves first.

Thanks a lot,
Alex

Alex Averbuch

unread,
Nov 10, 2010, 3:15:24 AM11/10/10
to akka...@googlegroups.com
Correction, thanks to both Peters :-)

Peter Veentjer

unread,
Nov 10, 2010, 4:06:02 AM11/10/10
to akka...@googlegroups.com
On Wed, Nov 10, 2010 at 9:14 AM, Alex Averbuch <alex.a...@gmail.com> wrote:
Thanks for the comments so far.

Peter,
Thanks for the suggestions. 
Regarding Prometheus, it seems like it's not under active development, and would be difficult to get support for. Also, I'm not sure how stable I can expect it to be. Am I right in these assumptions?

Yes.

But you can harvest any idea's/code from it you need to. Especially the Repeater(ThreadPoolRepeater) could be useful for pooling 'workflow' threads. 
 
Regarding SBT, this is Scala only right? I'd prefer to work with Java, as the current code base is in Java. I know Scala/Java interop is good, but it complicates the code base and would require that the project maintainers (not me) are capable Scala developers.

To be a bit more clear...
  • I am not the project owner.
  • I would be able to spend 6 months on this project, but not more. Additionally, some of those 6 months will be used to evaluate the resulting code. For this reason I don't want to hand craft/tune/debug something, if a "good enough" solution already exists.
A good decisioned. 
  • This would be a proof of concept to see how much performance gain (if any) can be achieved, over the sequential version.
One of the big things you need to have look at is how big the tasks of the workflow are. If they are 'big' (so a lot of cpu or io) the overhead of some kind of framework can become neglectable. But if the tasks are very small, the overhead will be more visible. One advantage with actors is that it is easier to create a pipeline that can take advantage of the fact that multiple cpu's are available. 
 
  • The current code base is in Java, so I would like to stick to Java: For simplicity on my part, for maintainability on the part of the project owner, and because I'm still very new to Scala. Akka is appealing because if has both Java and Scala APIs.
  • The eventual goal is for this library to not only be parallel, but also distributed (not completely unlike Microsoft Dryad). So messages/streams (vs shared state) will eventually be a necessity. For that reason, I'd like to go in that direction from the beginning.
Given these points, is there a reason I shouldn't use Akka?

I would create a small poc and timebox it on a few weeks and see if spending more effort on the Akka path is likely to pay of, or that a different solution is needed.

Jonas Bonér

unread,
Nov 10, 2010, 10:26:30 AM11/10/10
to akka...@googlegroups.com
Akka's dataflow concurrency could be worth exploring. It is excellent
to model workflows in.

https://github.com/jboner/akka/blob/master/akka-actor/src/main/scala/dataflow/DataFlowVariable.scala
https://github.com/jboner/akka/blob/master/akka-actor/src/test/scala/dataflow/DataFlowSpec.scala

> --
> You received this message because you are subscribed to the Google Groups
> "Akka User List" group.
> To post to this group, send email to akka...@googlegroups.com.
> To unsubscribe from this group, send email to
> akka-user+...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/akka-user?hl=en.
>

--
Jonas Bonér

Jonas Bonér
Specialist at Large
work: http://scalablesolutions.se
code: http://akka.io
blog: http://jonasboner.com
twtr: @jboner

Alex Averbuch

unread,
Nov 10, 2010, 11:21:53 AM11/10/10
to akka...@googlegroups.com
Thanks Jonas,
I tried to find documentation on the website http://doc.akkasource.org/ but couldn't see any.
Here are my basic questions:
  • Is there a Java API to Akka Dataflow?
  • Can dataflow actors be connected remotely?
Thanks,
Alex

Alex Averbuch

unread,
Nov 10, 2010, 11:23:39 AM11/10/10
to akka...@googlegroups.com
I see the Java API for Dataflow was added in Milestone 2, that's great!

So my remaining questions are: can I connect remote dataflows, and is there any documentation anywhere?

Thanks

Jonas Bonér

unread,
Nov 10, 2010, 11:42:49 AM11/10/10
to akka...@googlegroups.com
On 10 November 2010 17:21, Alex Averbuch <alex.a...@gmail.com> wrote:
> Thanks Jonas,
> I tried to find documentation on the website http://doc.akkasource.org/ but
> couldn't see any.

We have not written any yet. Bad. Will do for 1.0.
Take a look at GPars docs. Their dataflow stuff is taken from Akka:
http://www.gpars.org/guide/guide/7.%20Dataflow%20Concurrency.html

> Here are my basic questions:
>
> Is there a Java API to Akka Dataflow?

Yes.

> Can dataflow actors be connected remotely?

Not yet. But that can be done easily since it is based on Actors, so
just basing it on Remote Actors would be not too hard. Do you need
that?

Alex Averbuch

unread,
Nov 10, 2010, 11:52:35 AM11/10/10
to akka...@googlegroups.com
Take a look at GPars docs. Their dataflow stuff is taken from Akka:
Cool, thanks, will take a look at it

Not yet. But that can be done easily since it is based on Actors, so
just basing it on Remote Actors would be not too hard. Do you need
that?
If possible, that would be awesome!
My first iteration of this project is to only utilize multi cores on 1 machine. 
My second iteration will be to connect a distributed workflow graph (a bit like Microsoft Dryad http://research.microsoft.com/en-us/projects/dryad/eurosys07.pdf) so I would then need the remote feature.

Jonas Bonér

unread,
Nov 10, 2010, 11:59:22 AM11/10/10
to akka...@googlegroups.com
On 10 November 2010 17:52, Alex Averbuch <alex.a...@gmail.com> wrote:
> Take a look at GPars docs. Their dataflow stuff is taken from Akka:
> http://www.gpars.org/guide/guide/7.%20Dataflow%20Concurrency.html
>
> Cool, thanks, will take a look at it
>
> Not yet. But that can be done easily since it is based on Actors, so
> just basing it on Remote Actors would be not too hard. Do you need
> that?
>
> If possible, that would be awesome!

https://www.assembla.com/spaces/akka/tickets/520-docs-for-dataflow-concurrency
https://www.assembla.com/spaces/akka/tickets/521-remote-dataflow-variables

> My first iteration of this project is to only utilize multi cores on 1
> machine.
> My second iteration will be to connect a distributed workflow graph (a bit
> like Microsoft
> Dryad http://research.microsoft.com/en-us/projects/dryad/eurosys07.pdf) so I
> would then need the remote feature.

Cool. Will you build it as Open Source?

Alex Averbuch

unread,
Nov 10, 2010, 7:03:40 PM11/10/10
to akka...@googlegroups.com
Cool. Will you build it as Open Source?
Definitely
I'll post progress here once it gets started

Jonas Bonér

unread,
Nov 11, 2010, 12:51:33 AM11/11/10
to akka...@googlegroups.com
Sounds great. Thanks. 

--
Jonas Bonér
Specialist at Large
Reply all
Reply to author
Forward
0 new messages