[2018-12-05T22:54:58.034+0000][safepoint] Thread: 0x00007f23840d1800 [0x17e4] State: _running _has_called_back 0 _at_poll_safepoint 0
[2018-12-05T22:54:58.034+0000][safepoint] Thread: 0x00007f23840d1800 [0x17e4] State: _running _has_called_back 0 _at_poll_safepoint 0
[2018-12-05T22:54:58.034+0000][safepoint] Thread: 0x00007f23840d1800 [0x17e4] State: _running _has_called_back 0 _at_poll_safepoint 0
... (repeated continually for approx 2 seconds)
[2018-12-05T22:55:00.548+0000][safepoint] Thread: 0x00007f23840d1800 [0x17e4] State: _running _has_called_back 0 _at_poll_safepoint 0
[2018-12-05T22:55:00.549+0000][safepoint] Thread: 0x00007f23840d1800 [0x17e4] State: _running _has_called_back 0 _at_poll_safepoint 0
[2018-12-05T22:55:00.550+0000][safepoint] Thread: 0x00007f23840d1800 [0x17e4] State: _running _has_called_back 0 _at_poll_safepoint 0
[2018-12-05T22:55:00.551+0000][safepoint] ... found polling page return exception at pc = 0x00007f3397c7deff, stub =0x00007f338fb4c900
[2018-12-05T22:55:00.551+0000][safepoint] Thread: 0x00007f23840d1800 [0x17e4] State: _call_back _has_called_back 0 _at_poll_safepoint 0
[2018-12-05T22:55:00.551+0000][safepoint] Waiting for 3 thread(s) to block
[2018-12-05T22:55:00.551+0000][safepoint] Entering safepoint region: G1IncCollectionPause
[2018-12-05T22:55:00.566+0000][safepoint] Leaving safepoint region
[2018-12-05T22:55:00.566+0000][safepoint] Total time for which application threads were stopped: 2.5331635 seconds, Stopping threads took: 2.5180973 seconds
ReplicaFetcherT 6116 [007] 70951.936265: 47425865 cycles:ppp: 7f338fc9ad8a register_finalizer Runtime1 stub (/tmp/perf-3239.map)
ReplicaFetcherT 6116 [007] 70951.987993: 61736469 cycles:ppp: 7f33977d3165 Ljava/util/ArrayList;::add (/tmp/perf-3239.map)
** big time gap here
ReplicaFetcherT 6116 [007] 70954.862422: 70423359 cycles:ppp: 7f3397b26fe0 Ljava/util/LinkedHashMap;::afterNodeInsertion (/tmp/perf-3239.map)
ReplicaFetcherT 6116 [007] 70954.955571: 58601064 cycles:ppp: 7f3397782118 Ljava/lang/String;::charAt (/tmp/perf-3239.map)
ReplicaFetcherT 6116 [007] 70955.010857: 56837522 cycles:ppp: 7f33977347b6 Ljava/util/concurrent/ConcurrentHashMap;::tabAt (/tmp/perf-3239.map)
ReplicaFetcherT 6116 [007] 70951.864747: 52073629 cycles:ppp: 7f3397f784f4 Lorg/apache/kafka/common/TopicPartition;::equals (/tmp/perf-3239.map)
swapper 0 [007] 70951.889555: 52939697 cycles:ppp: ffffffff9f39cca9 cpu_idle_poll ([kernel.kallsyms])
swapper 0 [007] 70951.914775: 45114901 cycles:ppp: ffffffff9f39ccab cpu_idle_poll ([kernel.kallsyms])
ReplicaFetcherT 6116 [007] 70951.936265: 47425865 cycles:ppp: 7f338fc9ad8a register_finalizer Runtime1 stub (/tmp/perf-3239.map)
swapper 0 [007] 70951.958858: 61162448 cycles:ppp: ffffffff9f39ccab cpu_idle_poll ([kernel.kallsyms])
ReplicaFetcherT 6116 [007] 70951.987993: 61736469 cycles:ppp: 7f33977d3165 Ljava/util/ArrayList;::add (/tmp/perf-3239.map)
swapper 0 [007] 70952.017399: 66816847 cycles:ppp: ffffffff9f39ccb4 cpu_idle_poll ([kernel.kallsyms])
swapper 0 [007] 70952.049231: 63438867 cycles:ppp: ffffffff9f39cca9 cpu_idle_poll ([kernel.kallsyms])
... (repeated continually for approx 2 seconds)
swapper 0 [007] 70954.228216: 201536401 cycles:ppp: ffffffff9f39cca9 cpu_idle_poll ([kernel.kallsyms])
swapper 0 [007] 70954.324191: 216277840 cycles:ppp: ffffffff9f39ccab cpu_idle_poll ([kernel.kallsyms])
swapper 0 [007] 70954.427185: 227564188 cycles:ppp: ffffffff9f39cca9 cpu_idle_poll ([kernel.kallsyms])
swapper 0 [007] 70954.535554: 236205951 cycles:ppp: ffffffff9f39cca9 cpu_idle_poll ([kernel.kallsyms])
swapper 0 [007] 70954.648051: 137191740 cycles:ppp: ffffffff9f39ccab cpu_idle_poll ([kernel.kallsyms])
swapper 0 [007] 70954.713384: 159810740 cycles:ppp: ffffffff9f39cca9 cpu_idle_poll ([kernel.kallsyms])
swapper 0 [007] 70954.789490: 153089075 cycles:ppp: ffffffff9f39ccab cpu_idle_poll ([kernel.kallsyms])
ReplicaFetcherT 6116 [007] 70954.862422: 70423359 cycles:ppp: 7f3397b26fe0 Ljava/util/LinkedHashMap;::afterNodeInsertion (/tmp/perf-3239.map)
swapper 0 [007] 70954.895969: 65525177 cycles:ppp: ffffffff9f39cca9 cpu_idle_poll ([kernel.kallsyms])
swapper 0 [007] 70954.927183: 59590550 cycles:ppp: ffffffff9f39ccab cpu_idle_poll ([kernel.kallsyms])
ReplicaFetcherT 6116 [007] 70954.955571: 58601064 cycles:ppp: 7f3397782118 Ljava/lang/String;::charAt (/tmp/perf-3239.map)
swapper 0 [007] 70954.983485: 57455478 cycles:ppp: ffffffff9f39ccab cpu_idle_poll ([kernel.kallsyms])
ReplicaFetcherT 6116 [007] 70955.010857: 56837522 cycles:ppp: 7f33977347b6 Ljava/util/concurrent/ConcurrentHashMap;::tabAt (/tmp/perf-3239.map)
/usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xmx4G -Xms4G \
-server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 \
-XX:InitiatingHeapOccupancyPercent=35 \
-XX:+ExplicitGCInvokesConcurrent \
-Djava.awt.headless=true \
-XX:+PreserveFramePointer \
-XX:+UnlockDiagnosticVMOptions \
-XX:+DebugNonSafepoints \
-Xlog:safepoint=trace:safepoint.log:utctime,tags:filecount=20,filesize=102400000 \
-XX:+UseCountedLoopSafepoints \
-XX:LoopStripMiningIter=10 \
-XX:-Inline \
...
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hi,
When running Kafka servers on Linux, the zookeeper sessions have been expiring frequently.
After lots of investigation, I discovered some threads (not the zookeeper threads)
taking a very long time to get to the safepoint (frequently > 2 seconds)
A quick google shows that Kafka is using writeable mmaps. By
itself this is already terrible if you have a large amount of data
to write, but in combination with the JVM it is horrendous. An
innocent memory access can cause the thread to be blocked on I/O,
and since the dirtying rate can be much higher that cleaning rate,
you can end up with the thread blocked for large amounts of time.
The JVM doesn't know anything about it, so as far as it knows the
thread is at a non-safe point.
If Kafka has a setting that disables mmap, I recommend trying it.
The thread will still block, but at least it won't prevent Java
from reaching a safepoint.
Take a look at the safepoint profiler set-up analysis I did on this here:"JVM detection of thread at safepoint"
The only thing I would like to know is how many cores are you using and what is the total thread count?
A quick google shows that Kafka is using writeable mmaps. By itself this is already terrible if you have a large amount of data to write, but in combination with the JVM it is horrendous.