Hi Viliam,
there is no backpressure and CPU saturation is around 60-80%. Queue sizes vary between 0-1, in rare cases 4 max.
Job is basically large DAG consisting of 80% AsyncTransformUsingContextUnorderedP-like processors, 20% cooperative processors. Total 1400 processor instances, distributed equally into TaskletExecutionService$CooperativeWorker.trackers.
Time spent inside processors (measured separately and written into processed item as start timestamp + elapsed time) is typically < 1ms. This way I can observe latency between individual processors by measuring time difference between individual processor start timestamps. Time spent to pass items between processors is increasing under load from 3-6ms up to 120ms, reducing overall throughput at the moment at 240 end-to-end tps.
Some statistics: 26000 Hazelcast operations-per-second, CPU 6,7 cores out of 8, 14000 processor executions per second (4600 per node), 150 live threads. The expectation is to reach 2x more.
Regards