Hello hAkkers!
We've got some very strange message delivery time pattern between actors:
We have system with ~2000 type "A" working actors, each of whom have 1 to 50 type "B" sub-workers (who do actual work, but do it very fast -- >1ms between request and response).
Every type A actor every second receives 1 to 50 (equal to number of sub-workers) payloads of 1 to 10 messages (1 to 500 messages total), chooses one type B actor per payload (1 to 10 messages), forwards them and interpreting B-actor result.
Number of "in-payload" messages is dependent of 'second-per-minute' -- most messages are received in 29th and 59th seconds.
Usually a total time of message processing is around 1 to 5ms for a full circle: in -> A -> B -> A -> out.
But in the "high load" times processing time quickly escalates up to 2.5 _seconds_.
After some investigation and providing of separate dispatchers for type A (FJE, parallelism 8 min, 64 max, 3.0 factor) and type B (same configuration) actors, we were able to determine that type A actor still receive messages at very fast rate, but type B actors...
At the start of processing they receive payload almost momentarily (0 to 1ms latency), but as processing continues, time to deliver message from A to B starts increasing up to 2.5 seconds mentioned above.
We tried to tweak type B dispatcher and set SingleConsumerOnlyUnboundedMailbox for them to no effect at all.
From hardware side we have dual 6 core Intel server class processors (24 cores total), JVM has 32GB of ram (no swapping), G1GC, gc pauses do not exceed 100ms and happens usually every 10-15 seconds.
Is there anything else that we can tweak or use to pinpoint the problem? May be some metric for average actor queue size and per-actor dispatcher time?