Utility of SubFlow generating methods confusing to me

387 views
Skip to first unread message

Tim Harper

unread,
Jan 4, 2016, 12:50:34 AM1/4/16
to Akka User List
A big reason why groupBy, splitWhen (and other split* friends) were useful was the ability to apply different behavior to the bifurcated streams. With the recent (and, what seems to been a bit sudden) change to SubFlow, I'm having a difficulty understanding the value of these methods.

It seems like the only thing it offers you, now, is to provide a grouping for parallelizing the streams, which I'll confess to be somewhat useful for groupBy, but for the split* family of methods there seems to be little tenable benefit.

I scanned the gitter backlog for more backstory on why this change happened. Being acquainted with Roland Kuhn's work, I entirely trust that it was made thoughtfully and with good reason. Is there anything I can read to understand more of the motivation for the change? Also, are there any examples of circumstances in which split{When,After} are useful, now that bifurcated streams have homogenous processing behavior?

Thanks!

Tim

Roland Kuhn

unread,
Jan 7, 2016, 5:12:58 PM1/7/16
to akka...@googlegroups.com
Hi Tim,

This has not been forgotten, but I'm on vacation now, will reply next week.

Regards,

Roland 

Sent from my iPhone
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Tim Harper

unread,
Jan 7, 2016, 5:34:23 PM1/7/16
to akka...@googlegroups.com
Roland, your thoughtfulness exceeds my expectations. I was hoping there was something already written that I was just failing to find. I know there's a bunch of stuff to do and explaining the rational on everything is not the highest priority.

Tim

You received this message because you are subscribed to a topic in the Google Groups "Akka User List" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/akka-user/_blLOcIHxJ4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to akka-user+...@googlegroups.com.

Roland Kuhn

unread,
Jan 14, 2016, 9:55:51 AM1/14/16
to akka-user
Hi Tim,

sorry again for the delay. You raise a valid question that is not answered in the documentation, perhaps it should be.

The handling of substreams has been a concern in our design from day 1, the most prominent problem being that users may decide to not consume all of them (using combinators like filter() or take() that drop elements containing substream Publishers to the floor). This has an associated cost because the substreams bind resources that need to be released explicitly, plus it is very easy to create deadlocks by never consuming the data that would need to be fed into these streams. Another pitfall for users is that the substreams are coupled and must therefore be drained in a fashion that is compatible with their source, in particular groupBy would deadlock easily when merging the resulting streams with a breadth that is exceeded at runtime.

The idea for the change came from the Gearpump team, they mentioned that the signature of groupBy in big data analytics does not usually return a stream of streams, it returns a restricted substream abstraction that can be handled specially by the platform. The issue for these engines is that they are designed for flow graphs with a stable (i.e. non-dynamic) stream layout, the network of combinators must be known before elements flow through it. This is exactly what the new representation of splitWhen/splitAfter/groupBy achieves, the materializer knows up-front what shall be materialized for the substreams; this allows a wider range of optimizations to be applied.

Looking at the current signatures we see that they prevent users from expressing most of the dangerous (deadlocking) stream layouts: groupBy will limit the number of open streams and configure the merge such that it will not deadlock, the split combinators make it impossible to write code that tries to consume the substreams out-of-order. Dynamic processing can still be expressed through stateful combinators like fold and scan that configure the contained computation according to the first processed element—the normal use of split is to factor out boundary detection from the rest of the stream transformation.

We left an escape hatch for those use-cases which require true substreams, but we intentionally do not document it as such: using prefixAndTail(0) you can lift a stream into a single element that contains a sub-source that you can transform to your heart’s content—with all the pitfalls.

Looking ahead I see several more interesting applications of the SubFlow infrastructure: we might offer a retry mechanism for pieces of a Flow, or we might parallelize a piece of a Flow, or we might dynamically insert processing steps depending on the first element of a Flow.

I hope this explains the rationale and gives you some ideas on how to exploit the new features, and if you see use-cases that are currently not covered then please let us know!

Regards,

Roland

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.



Dr. Roland Kuhn
Akka Tech Lead
Typesafe – Reactive apps on the JVM.
twitter: @rolandkuhn


Tim Harper

unread,
Jan 15, 2016, 4:42:06 PM1/15/16
to Akka User List
Fantastic explanation. I can work your answer into the docs. Any place you'd best like it go? I'm thinking here: http://doc.akka.io/docs/akka-stream-and-http-experimental/2.0.1/scala/migration-guide-1.0-2.x-scala.html#groupby-splitwhen-and-splitafter-now-return-subflow

Roland Kuhn

unread,
Jan 16, 2016, 2:08:03 PM1/16/16
to akka-user
Yes, thanks! (and to the Java version as well, please)

Tim Harper

unread,
Jan 18, 2016, 3:37:47 AM1/18/16
to akka...@googlegroups.com
Okay! Here it is!

https://github.com/akka/akka/pull/19501

Working on this led me need to upgrade sbt-pgp, which led me to discover a bug in sbt's handling of AutoPlugins, which led me to work on SBT and submit a patch for that... finally to get to this.

Thanks for being nice and helpful. Your interactions have rippled and inspired me.

Tim

You received this message because you are subscribed to a topic in the Google Groups "Akka User List" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/akka-user/_blLOcIHxJ4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to akka-user+...@googlegroups.com.

Roland Kuhn

unread,
Jan 18, 2016, 4:13:53 AM1/18/16
to akka-user
Wow, nice! Thanks a lot for your enthusiasm, Tim!

Viktor Klang

unread,
Jan 18, 2016, 4:35:22 AM1/18/16
to Akka User List
+1!
--
Cheers,
Reply all
Reply to author
Forward
0 new messages