Hi Tim,
sorry again for the delay. You raise a valid question that is not answered in the documentation, perhaps it should be.
The handling of substreams has been a concern in our design from day 1, the most prominent problem being that users may decide to not consume all of them (using combinators like filter() or take() that drop elements containing substream Publishers to the floor). This has an associated cost because the substreams bind resources that need to be released explicitly, plus it is very easy to create deadlocks by never consuming the data that would need to be fed into these streams. Another pitfall for users is that the substreams are coupled and must therefore be drained in a fashion that is compatible with their source, in particular groupBy would deadlock easily when merging the resulting streams with a breadth that is exceeded at runtime.
The idea for the change came from the Gearpump team, they mentioned that the signature of groupBy in big data analytics does not usually return a stream of streams, it returns a restricted substream abstraction that can be handled specially by the platform. The issue for these engines is that they are designed for flow graphs with a stable (i.e. non-dynamic) stream layout, the network of combinators must be known before elements flow through it. This is exactly what the new representation of splitWhen/splitAfter/groupBy achieves, the materializer knows up-front what shall be materialized for the substreams; this allows a wider range of optimizations to be applied.
Looking at the current signatures we see that they prevent users from expressing most of the dangerous (deadlocking) stream layouts: groupBy will limit the number of open streams and configure the merge such that it will not deadlock, the split combinators make it impossible to write code that tries to consume the substreams out-of-order. Dynamic processing can still be expressed through stateful combinators like fold and scan that configure the contained computation according to the first processed element—the normal use of split is to factor out boundary detection from the rest of the stream transformation.
We left an escape hatch for those use-cases which require true substreams, but we intentionally do not document it as such: using prefixAndTail(0) you can lift a stream into a single element that contains a sub-source that you can transform to your heart’s content—with all the pitfalls.
Looking ahead I see several more interesting applications of the SubFlow infrastructure: we might offer a retry mechanism for pieces of a Flow, or we might parallelize a piece of a Flow, or we might dynamically insert processing steps depending on the first element of a Flow.
I hope this explains the rationale and gives you some ideas on how to exploit the new features, and if you see use-cases that are currently not covered then please let us know!
Regards,
Roland