Support for multiple token set joins for the same token set

26 views

Skip to first unread message

Vlad Averchenkov

unread,

Jan 23, 2014, 8:50:55 PM1/23/14

to sarasvat...@googlegroups.com

Hi Paul

I'd like to propose a new feature currently not supported in Sarasvati - multiple token set join nodes for the same token set, to join token set tokens forked after splitting token set 'thread'.
This may be useful e.g. for joining and processing/reporting intermediary results without waiting for full token set processing completion and final token set join.
Attached picture may explain better what I mean :-):

Current Sarasvati engine cannot process such graphs - process 'hangs' forever on join nodes.
That happens because according to the current token set join strategy every join node waits for ALL active token set tokens, and that will never happen in such graphs.
Really, after splitting token set token to tokens A and B, token A may reach only join node 1, while token B will arrive to join node 2 (see picture).
Neither join node 1, nor join node 2 will ever see all active tokens.
Instead they should wait only for tokens that can reach them, according to graph structure.

That allows us to re-formulate join strategy based on concept of reachability:
- Token set join is complete when all active token set tokens are either on direct arc to join node, or on node/arc from which join node cannot be reached.

According to this strategy, join node 2 should not wait for token A, because it cannot be reached from that point, but should wait for tokens B and C (see picture).
Reachability can be obtained from static graph analysis.

However, there is a hidden problem here. Let's look at the same graph, with an extra arc from join node 1 back to token set splitting node:

Now static graph analysis will tell that join node 2 is reachable from point A (A --> join 1 --> split node --> join 2), and join will hang waiting for token A.
This happens because static reachability analysis goes beyond 'token set area', where it makes sense.
Unfortunately, we cannot obtain token set area statically (from graph only): we know where token set area ends, but we cannot tell where it starts - graph does not have this info.
From the other hand, we can get starting token set node dynamically, when token set is created.
Then we could use starting token set node as a cut-off point in reachability analysis.

I attached 2 patches that implement token set join strategy described above.
The first one is full, supporting any graph with multiple token set join points, including those with back-to-start-after-join arcs, like the one in pictire 2.
However it requires minor change in database schema - we need to persist starting token set node name (for hibernate engine).

Second patch is a reduced version of the full one, that does not require any schema changes.
It does not stop reachability analysis on token set starting node, and hence does not support graphs like the one in picture 2.

Patches are prepared for version 2.0.2.

Both patches will work OK with all currently supported graphs (with single token set joins).

Please let me know what do you think about this change.

Thanks,
Vlad

multiple-join-points-full.patch

multiple-join-points-restricted.patch

Reply all

Reply to author

Forward

0 new messages