Performance of distributed edges

13 views
Skip to first unread message

Martin Krc

unread,
Aug 24, 2020, 9:52:12 AM8/24/20
to hazelcast-jet
Hi,

We have a manually configured DAG for a Hazelcast job, with one distributed node like this:

dag.edge(from(vertexA).to(vertexB).distributed()
.partitioned(m -> m.getId()));
dag.edge(from(vertexB).to(vertexC));

We just noticed in production that it took more than 50 seconds in one case between the message got from vertexA to vertexB - even though it didn't have to travel to another node, all happened just in a single node. 

The node was under some load at that moment caused by another running job, but I don't see any direct reason why it should be slowing down the first job so much. My question - what can slow down "the journey" of an item via distributed edges? I undertand that e.g. busy cooperative threads - anything else?

Martin

Marko Topolnik

unread,
Aug 24, 2020, 10:11:07 AM8/24/20
to hazelcast-jet
If you measure the time from the point tryEmit(item) succeeds, at that point Jet already placed the item into a concurrent queue so it's immediately available to the destination vertex. The destination vertex's logic will at some point take it from the queue.

So there doesn't seem to be anything else than the destination vertex being unable to make progress, but that can be for a number of reasons: either it experiences backpressure at its output edge, or it doesn't have the CPU time to do the work. Any other tasklet on the same cooperative worker thread could starve it of CPU time.
Reply all
Reply to author
Forward
0 new messages