Guaranteed Safepoints - GuaranteedSafepointInterval

Martin Thompson

unread,

Aug 7, 2015, 12:14:04 PM8/7/15

to mechanical-sympathy

Given Mark's nice post on guaranteed safepoints I wondered if any of our JVM friends would like to comment on what is a suitable default for GuaranteedSafepointInterval? Is it safe to make this many seconds?

http://epickrram.blogspot.co.uk/2015/08/jvm-guaranteed-safepoints.html

Regards,

Martin...

Vitaly Davidovich

unread,

Aug 7, 2015, 12:25:25 PM8/7/15

to mechanical-sympathy

I had a couple of emails on this topic a few months back that you may find interesting:

http://openjdk.5641.n7.nabble.com/GuaranteedSafepointInterval-clarification-td228438.html

http://openjdk.5641.n7.nabble.com/InlineCacheBuffer-GuaranteedSafepointInterval-td229138.html

I think the short answer is it's safe to make it longer if you're ok with having slightly slower performance (one extra jump, AFAIU) until the IC jump is removed (at the next safepoint). This GSI is quite annoying, particularly because it ends up piggybacking additional housekeeping tasks if it actually enters the safepoint (and I think Java 8 has additional tasks, such as nmethod hotness marking).

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gil Tene

unread,

Aug 7, 2015, 12:36:30 PM8/7/15

to mechanical-sympathy

Monitor deflation is one of the other things to watch for. While it is "rare" to see applications that produce a high rates of newly inflated monitors per second, there are some common ones that do just that. E.g. Cassandra can generate 10s of thousands of newly inflated monitors per second, and I've seen some trading frameworks do this too. If you let this accumulate for many seconds without running a monitor deflation pass, "interesting things" start to happen. At best you start to see big pauses when millions of monitors get deflated in a single safepoint. At worst... Boom.

I think that relaxing the rate of forced safepoints can be "safe" if you apply enough testing (read "verify stable day-long runs, even if you just increased the interval to 5 minutes").

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Gil Tene

unread,

Aug 7, 2015, 12:37:54 PM8/7/15

to mechanical-sympathy

That's why Zing has a concurrent monitor deflator that uses no safepoints, BTW...

Vitaly Davidovich

unread,

Aug 7, 2015, 12:44:05 PM8/7/15

to mechanical-sympathy

I think the presumption is that if you care about tuning GSI then you're not heavily contending on monitors, otherwise there're bigger fish to fry. What's particularly bothersome is the polling aspect; now I know we've all done/do polling as it simplifies signalling, but I'd think a lot of the things the JVM is checking for periodically can be signaled more explicitly. If GSI value is increased the JVM blows up because too many inflated monitors haven't been deflated, I'd say that's a JVM bug; if it *needs* a safepoint to safely continue, it should explicitly force one.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Gil Tene

unread,

Aug 7, 2015, 1:04:40 PM8/7/15

to mechanical-sympathy

As noted, I've seen trading apps produce many thousands of newly inflated monitors per second.... It's not a hypothetical: we ended up building a concurrent monitor deflator into Zing because we ran into this multiple times in the wild... And that's after giving up on the incremental safepoint approach as a temporary bandaid... It's weird to see, but it happens. Our working assumption is that it's not contention that drives these rates (hard to contend that fast on so many unique monitors), but wait/notify patterns that (for whatever reason) do their wait/notifies on fast moving ephemeral objects. So not your classic queueing pattern.

As for the polling part, I fully agree that rather than a periodic safepoint (assuming you actually need safepoints to do this stuff ;-) ), it would be much cleaner for there to be periodic (not in a safepoint) poll on the VM thread that would kick off a safepoint if something that needs a safepoint is pending (like IC site cleanups, or code cache scanning for some other reasons, or Monitor deflation because some threshold has been crossed, etc. etc.). The rate of polling would need to remain as frequent-as-is-needed-for-safety-or-speed, but the rate of actual safeppints would be much lower in most practical applications. If it were done like that, nobody would have needed to discuss the subject...

As for the "JVM crash on monitor inflation" being a bug comment below: I'd agree that's it would be bug if the default settings did that. And even if normally settable flags did that. But not if someone turns on "-XX:+UnlockDiagnosticVMOptions" and forcibly sets an internal timing value to an unplanned level... Since monitor inflation occurs at monitor_enter sites, and various optimizations often like to avoid making those sites safepoint-safe (which would be required to successfully deflate when an inflation is required and pools are exhausted), this gets "complicated". The approach of periodic polling with a high enough frequency to make it practically impossible to exhaust the pool before it gets deflated is a very valid one, as long as nobody forcibly messes with the frequency and makes it into a bug...

On Friday, August 7, 2015 at 9:44:05 AM UTC-7, Vitaly Davidovich wrote:

I think the presumption is that if you care about tuning GSI then you're not heavily contending on monitors, otherwise there're bigger fish to fry. What's particularly bothersome is the polling aspect; now I know we've all done/do polling as it simplifies signalling, but I'd think a lot of the things the JVM is checking for periodically can be signaled more explicitly. If GSI value is increased the JVM blows up because too many inflated monitors haven't been deflated, I'd say that's a JVM bug; if it *needs* a safepoint to safely continue, it should explicitly force one.

On Fri, Aug 7, 2015 at 12:36 PM, Gil Tene <g...@azulsystems.com> wrote:

Monitor deflation is one of the other things to watch for. While it is "rare" to see applications that produce a high rates of newly inflated monitors per second, there are some common ones that do just that. E.g. Cassandra can generate 10s of thousands of newly inflated monitors per second, and I've seen some trading frameworks do this too. If you let this accumulate for many seconds without running a monitor deflation pass, "interesting things" start to happen. At best you start to see big pauses when millions of monitors get deflated in a single safepoint. At worst... Boom.

I think that relaxing the rate of forced safepoints can be "safe" if you apply enough testing (read "verify stable day-long runs, even if you just increased the interval to 5 minutes").

On Friday, August 7, 2015 at 9:25:25 AM UTC-7, Vitaly Davidovich wrote:

I had a couple of emails on this topic a few months back that you may find interesting:

http://openjdk.5641.n7.nabble.com/GuaranteedSafepointInterval-clarification-td228438.html
http://openjdk.5641.n7.nabble.com/InlineCacheBuffer-GuaranteedSafepointInterval-td229138.html

I think the short answer is it's safe to make it longer if you're ok with having slightly slower performance (one extra jump, AFAIU) until the IC jump is removed (at the next safepoint). This GSI is quite annoying, particularly because it ends up piggybacking additional housekeeping tasks if it actually enters the safepoint (and I think Java 8 has additional tasks, such as nmethod hotness marking).

On Fri, Aug 7, 2015 at 12:14 PM, Martin Thompson <mjp...@gmail.com> wrote:

Given Mark's nice post on guaranteed safepoints I wondered if any of our JVM friends would like to comment on what is a suitable default for GuaranteedSafepointInterval? Is it safe to make this many seconds?

http://epickrram.blogspot.co.uk/2015/08/jvm-guaranteed-safepoints.html

Regards,
Martin...

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Vitaly Davidovich

unread,

Aug 7, 2015, 1:22:44 PM8/7/15

to mechanical-sympathy

As noted, I've seen trading apps produce many thousands of newly inflated monitors per second.... It's not a hypothetical: we ended up building a concurrent monitor deflator into Zing because we ran into this multiple times in the wild... And that's after giving up on the incremental safepoint approach as a temporary bandaid... It's weird to see, but it happens. Our working assumption is that it's not contention that drives these rates (hard to contend that fast on so many unique monitors), but wait/notify patterns that (for whatever reason) do their wait/notifies on fast moving ephemeral objects. So not your classic queueing pattern.

What do you mean by "fast moving ephemeral objects"? Fast moving as in newly allocated or many objects being used for wait/notify concurrently? In either case, I'd find it hard to believe that these are the types of systems where one would want to tune GSI to decrease latency/jitter. Or am I missing something?

As for the "JVM crash on monitor inflation" being a bug comment below: I'd agree that's it would be bug if the default settings did that. And even if normally settable flags did that. But not if someone turns on "-XX:+UnlockDiagnosticVMOptions" and forcibly sets an internal timing value to an unplanned level... Since monitor inflation occurs at monitor_enter sites, and various optimizations often like to avoid making those sites safepoint-safe (which would be required to successfully deflate when an inflation is required and pools are exhausted), this gets "complicated". The approach of periodic polling with a high enough frequency to make it practically impossible to exhaust the pool before it gets deflated is a very valid one, as long as nobody forcibly messes with the frequency and makes it into a bug...

Well, the reason I think it's a bug still is because the default GSI is just a hardcoded value, I don't think (correct me if I'm wrong) it's picked ergonomically in any way nor dynamically adjusted. So how can it be tuned generically without knowing the workload/hardware/etc it will be subjected to? This would be analogous to a deopt being scheduled at the next GSI and not done explicitly when it's not safe to continue anymore. I fully agree that it makes things simpler though.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Kirk Pepperdine

unread,

Aug 7, 2015, 2:05:27 PM8/7/15

to mechanica...@googlegroups.com

I have seen a lot of unexplained safe-pointing behavior that this interval doesn’t explain. I need to dig up some graphs to show you.

— Kirk

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

signature.asc

Vitaly Davidovich

unread,

Aug 7, 2015, 2:18:30 PM8/7/15

to mechanical-sympathy

This interval only defines when the VM thread wakes up to check if a SP is needed. It's currently predicated on ICBuffer having entries to clean up. Once it enters the safepoint, it then also performs a bunch of other cleanup tasks. So the actual times safepoints will occur will not be as regular as the GSI interval. If you look here, http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/1aef080fd28d/src/share/vm/runtime/vmThread.cpp#l429, it'll ask if cleanup is needed, and that's predicated on the ICBuffer reporting having entries. If that cleanup is needed, the safepoint is then initiated with all cleanup tasks: http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/1aef080fd28d/src/share/vm/runtime/safepoint.cpp#l488

So it seems like if compiled code, e.g., sees a new receiver and creates an IC, the next poll will enter a safepoint.

On Fri, Aug 7, 2015 at 2:05 PM, Kirk Pepperdine <ki...@kodewerk.com> wrote:

I have seen a lot of unexplained safe-pointing behavior that this interval doesn’t explain. I need to dig up some graphs to show you.

— Kirk

On Aug 7, 2015, at 6:36 PM, Gil Tene <g...@azulsystems.com> wrote:

Monitor deflation is one of the other things to watch for. While it is "rare" to see applications that produce a high rates of newly inflated monitors per second, there are some common ones that do just that. E.g. Cassandra can generate 10s of thousands of newly inflated monitors per second, and I've seen some trading frameworks do this too. If you let this accumulate for many seconds without running a monitor deflation pass, "interesting things" start to happen. At best you start to see big pauses when millions of monitors get deflated in a single safepoint. At worst... Boom.

I think that relaxing the rate of forced safepoints can be "safe" if you apply enough testing (read "verify stable day-long runs, even if you just increased the interval to 5 minutes").

On Friday, August 7, 2015 at 9:25:25 AM UTC-7, Vitaly Davidovich wrote:

I had a couple of emails on this topic a few months back that you may find interesting:

http://openjdk.5641.n7.nabble.com/GuaranteedSafepointInterval-clarification-td228438.html
http://openjdk.5641.n7.nabble.com/InlineCacheBuffer-GuaranteedSafepointInterval-td229138.html

I think the short answer is it's safe to make it longer if you're ok with having slightly slower performance (one extra jump, AFAIU) until the IC jump is removed (at the next safepoint). This GSI is quite annoying, particularly because it ends up piggybacking additional housekeeping tasks if it actually enters the safepoint (and I think Java 8 has additional tasks, such as nmethod hotness marking).

On Fri, Aug 7, 2015 at 12:14 PM, Martin Thompson <mjp...@gmail.com> wrote:

Given Mark's nice post on guaranteed safepoints I wondered if any of our JVM friends would like to comment on what is a suitable default for GuaranteedSafepointInterval? Is it safe to make this many seconds?

http://epickrram.blogspot.co.uk/2015/08/jvm-guaranteed-safepoints.html

Regards,
Martin...

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gil Tene

unread,

Aug 7, 2015, 8:48:35 PM8/7/15

to mechanical-sympathy

On Friday, August 7, 2015 at 10:22:44 AM UTC-7, Vitaly Davidovich wrote:

As noted, I've seen trading apps produce many thousands of newly inflated monitors per second.... It's not a hypothetical: we ended up building a concurrent monitor deflator into Zing because we ran into this multiple times in the wild... And that's after giving up on the incremental safepoint approach as a temporary bandaid... It's weird to see, but it happens. Our working assumption is that it's not contention that drives these rates (hard to contend that fast on so many unique monitors), but wait/notify patterns that (for whatever reason) do their wait/notifies on fast moving ephemeral objects. So not your classic queueing pattern.

What do you mean by "fast moving ephemeral objects"? Fast moving as in newly allocated or many objects being used for wait/notify concurrently? In either case, I'd find it hard to believe that these are the types of systems where one would want to tune GSI to decrease latency/jitter. Or am I missing something?

I mean fast moving in the "allocated and discarded rapidly". As in "one per message or operation". And some of apps I've seen do this are (unfortunately) exactly the ones that would care about frequent 100-200usec hiccups, so they'd look to tune GSI. I'm noting this as a warning to such apps (per Martin's question of "is it safe?").

As for the "JVM crash on monitor inflation" being a bug comment below: I'd agree that's it would be bug if the default settings did that. And even if normally settable flags did that. But not if someone turns on "-XX:+UnlockDiagnosticVMOptions" and forcibly sets an internal timing value to an unplanned level... Since monitor inflation occurs at monitor_enter sites, and various optimizations often like to avoid making those sites safepoint-safe (which would be required to successfully deflate when an inflation is required and pools are exhausted), this gets "complicated". The approach of periodic polling with a high enough frequency to make it practically impossible to exhaust the pool before it gets deflated is a very valid one, as long as nobody forcibly messes with the frequency and makes it into a bug...

Well, the reason I think it's a bug still is because the default GSI is just a hardcoded value, I don't think (correct me if I'm wrong) it's picked ergonomically in any way nor dynamically adjusted. So how can it be tuned generically without knowing the workload/hardware/etc it will be subjected to? This would be analogous to a deopt being scheduled at the next GSI and not done explicitly when it's not safe to continue anymore. I fully agree that it makes things simpler though.

On Fri, Aug 7, 2015 at 1:04 PM, Gil Tene <g...@azulsystems.com> wrote:

As noted, I've seen trading apps produce many thousands of newly inflated monitors per second.... It's not a hypothetical: we ended up building a concurrent monitor deflator into Zing because we ran into this multiple times in the wild... And that's after giving up on the incremental safepoint approach as a temporary bandaid... It's weird to see, but it happens. Our working assumption is that it's not contention that drives these rates (hard to contend that fast on so many unique monitors), but wait/notify patterns that (for whatever reason) do their wait/notifies on fast moving ephemeral objects. So not your classic queueing pattern.

As for the polling part, I fully agree that rather than a periodic safepoint (assuming you actually need safepoints to do this stuff ;-) ), it would be much cleaner for there to be periodic (not in a safepoint) poll on the VM thread that would kick off a safepoint if something that needs a safepoint is pending (like IC site cleanups, or code cache scanning for some other reasons, or Monitor deflation because some threshold has been crossed, etc. etc.). The rate of polling would need to remain as frequent-as-is-needed-for-safety-or-speed, but the rate of actual safeppints would be much lower in most practical applications. If it were done like that, nobody would have needed to discuss the subject...

As for the "JVM crash on monitor inflation" being a bug comment below: I'd agree that's it would be bug if the default settings did that. And even if normally settable flags did that. But not if someone turns on "-XX:+UnlockDiagnosticVMOptions" and forcibly sets an internal timing value to an unplanned level... Since monitor inflation occurs at monitor_enter sites, and various optimizations often like to avoid making those sites safepoint-safe (which would be required to successfully deflate when an inflation is required and pools are exhausted), this gets "complicated". The approach of periodic polling with a high enough frequency to make it practically impossible to exhaust the pool before it gets deflated is a very valid one, as long as nobody forcibly messes with the frequency and makes it into a bug...

On Friday, August 7, 2015 at 9:44:05 AM UTC-7, Vitaly Davidovich wrote:

I think the presumption is that if you care about tuning GSI then you're not heavily contending on monitors, otherwise there're bigger fish to fry. What's particularly bothersome is the polling aspect; now I know we've all done/do polling as it simplifies signalling, but I'd think a lot of the things the JVM is checking for periodically can be signaled more explicitly. If GSI value is increased the JVM blows up because too many inflated monitors haven't been deflated, I'd say that's a JVM bug; if it *needs* a safepoint to safely continue, it should explicitly force one.

On Fri, Aug 7, 2015 at 12:36 PM, Gil Tene <g...@azulsystems.com> wrote:

Monitor deflation is one of the other things to watch for. While it is "rare" to see applications that produce a high rates of newly inflated monitors per second, there are some common ones that do just that. E.g. Cassandra can generate 10s of thousands of newly inflated monitors per second, and I've seen some trading frameworks do this too. If you let this accumulate for many seconds without running a monitor deflation pass, "interesting things" start to happen. At best you start to see big pauses when millions of monitors get deflated in a single safepoint. At worst... Boom.

I think that relaxing the rate of forced safepoints can be "safe" if you apply enough testing (read "verify stable day-long runs, even if you just increased the interval to 5 minutes").

On Friday, August 7, 2015 at 9:25:25 AM UTC-7, Vitaly Davidovich wrote:

I had a couple of emails on this topic a few months back that you may find interesting:

http://openjdk.5641.n7.nabble.com/GuaranteedSafepointInterval-clarification-td228438.html
http://openjdk.5641.n7.nabble.com/InlineCacheBuffer-GuaranteedSafepointInterval-td229138.html

I think the short answer is it's safe to make it longer if you're ok with having slightly slower performance (one extra jump, AFAIU) until the IC jump is removed (at the next safepoint). This GSI is quite annoying, particularly because it ends up piggybacking additional housekeeping tasks if it actually enters the safepoint (and I think Java 8 has additional tasks, such as nmethod hotness marking).

On Fri, Aug 7, 2015 at 12:14 PM, Martin Thompson <mjp...@gmail.com> wrote:

Given Mark's nice post on guaranteed safepoints I wondered if any of our JVM friends would like to comment on what is a suitable default for GuaranteedSafepointInterval? Is it safe to make this many seconds?

http://epickrram.blogspot.co.uk/2015/08/jvm-guaranteed-safepoints.html

Regards,
Martin...

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Gil Tene

unread,

Aug 7, 2015, 8:50:02 PM8/7/15

to mechanical-sympathy

On Friday, August 7, 2015 at 11:18:30 AM UTC-7, Vitaly Davidovich wrote:

This interval only defines when the VM thread wakes up to check if a SP is needed.

Ok. That's much better than I thought then (and the way it should be). The question for Mark's blog entry is then "why does your application seem to decide it actually needs a safepoint so often?".

It's currently predicated on ICBuffer having entries to clean up. Once it enters the safepoint, it then also performs a bunch of other cleanup tasks. So the actual times safepoints will occur will not be as regular as the GSI interval. If you look here, http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/1aef080fd28d/src/share/vm/runtime/vmThread.cpp#l429, it'll ask if cleanup is needed, and that's predicated on the ICBuffer reporting having entries. If that cleanup is needed, the safepoint is then initiated with all cleanup tasks: http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/1aef080fd28d/src/share/vm/runtime/safepoint.cpp#l488

So it seems like if compiled code, e.g., sees a new receiver and creates an IC, the next poll will enter a safepoint.

On Fri, Aug 7, 2015 at 2:05 PM, Kirk Pepperdine <ki...@kodewerk.com> wrote:

I have seen a lot of unexplained safe-pointing behavior that this interval doesn’t explain. I need to dig up some graphs to show you.

— Kirk

On Aug 7, 2015, at 6:36 PM, Gil Tene <g...@azulsystems.com> wrote:

Monitor deflation is one of the other things to watch for. While it is "rare" to see applications that produce a high rates of newly inflated monitors per second, there are some common ones that do just that. E.g. Cassandra can generate 10s of thousands of newly inflated monitors per second, and I've seen some trading frameworks do this too. If you let this accumulate for many seconds without running a monitor deflation pass, "interesting things" start to happen. At best you start to see big pauses when millions of monitors get deflated in a single safepoint. At worst... Boom.

I think that relaxing the rate of forced safepoints can be "safe" if you apply enough testing (read "verify stable day-long runs, even if you just increased the interval to 5 minutes").

On Friday, August 7, 2015 at 9:25:25 AM UTC-7, Vitaly Davidovich wrote:

I had a couple of emails on this topic a few months back that you may find interesting:

http://openjdk.5641.n7.nabble.com/GuaranteedSafepointInterval-clarification-td228438.html
http://openjdk.5641.n7.nabble.com/InlineCacheBuffer-GuaranteedSafepointInterval-td229138.html

I think the short answer is it's safe to make it longer if you're ok with having slightly slower performance (one extra jump, AFAIU) until the IC jump is removed (at the next safepoint). This GSI is quite annoying, particularly because it ends up piggybacking additional housekeeping tasks if it actually enters the safepoint (and I think Java 8 has additional tasks, such as nmethod hotness marking).

On Fri, Aug 7, 2015 at 12:14 PM, Martin Thompson <mjp...@gmail.com> wrote:

Given Mark's nice post on guaranteed safepoints I wondered if any of our JVM friends would like to comment on what is a suitable default for GuaranteedSafepointInterval? Is it safe to make this many seconds?

http://epickrram.blogspot.co.uk/2015/08/jvm-guaranteed-safepoints.html

Regards,
Martin...

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Kirk Pepperdine

unread,

Aug 8, 2015, 4:26:07 AM8/8/15

to mechanica...@googlegroups.com

Hi Vitaly,

Thanks for the explanation, This certainly explains the safe-point “banding” which is commonly seen in logs. The JVM commonly reports that the reason for a safe-point is “unknown”. My question is; if the VM-thread calls for a SP because it happens to find work in the ICBuffer, does it mark the reason for the SP as unknown?

IME, excessive SPing is the smaller elephant in the JVM but it’s still an elephant and it can put a significant drag on application throughput/latencies. Sometimes there are simply thing you can do to not put more pressure on the sore spot which translates to a need to better understand why the JVM needed to safe point in order to understand if you can quiesce it.

Kind regards,

Kirk

signature.asc

Vitaly Davidovich

unread,

Aug 8, 2015, 12:45:19 PM8/8/15

to mechanical-sympathy

Interesting. I'd expect these apps to experience more jitter due to allocations and not bother with GSI until that's sorted.

sent from my phone

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Vitaly Davidovich

unread,

Aug 8, 2015, 12:51:33 PM8/8/15

to mechanical-sympathy

Yes, that is a good question. Unfortunately, the TraceIC diagnostic hotspot flag isn't available in product builds, so would need to run the code on the debug build. Given it's the IC that's triggering this, my assumption is that a compiled callsite is seeing a new receiver. I'd try running the benchmark with PrintCompilation and maybe PrintInlining and see if anything shows up during the GSI induced safepoints.

sent from my phone

On Aug 7, 2015 8:50 PM, "Gil Tene" <g...@azulsystems.com> wrote:

On Friday, August 7, 2015 at 11:18:30 AM UTC-7, Vitaly Davidovich wrote:
This interval only defines when the VM thread wakes up to check if a SP is needed.

Ok. That's much better than I thought then (and the way it should be). The question for Mark's blog entry is then "why does your application seem to decide it actually needs a safepoint so often?".

It's currently predicated on ICBuffer having entries to clean up. Once it enters the safepoint, it then also performs a bunch of other cleanup tasks. So the actual times safepoints will occur will not be as regular as the GSI interval. If you look here, http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/1aef080fd28d/src/share/vm/runtime/vmThread.cpp#l429, it'll ask if cleanup is needed, and that's predicated on the ICBuffer reporting having entries. If that cleanup is needed, the safepoint is then initiated with all cleanup tasks: http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/1aef080fd28d/src/share/vm/runtime/safepoint.cpp#l488

So it seems like if compiled code, e.g., sees a new receiver and creates an IC, the next poll will enter a safepoint.

On Fri, Aug 7, 2015 at 2:05 PM, Kirk Pepperdine <ki...@kodewerk.com> wrote:

I have seen a lot of unexplained safe-pointing behavior that this interval doesn’t explain. I need to dig up some graphs to show you.

— Kirk

On Aug 7, 2015, at 6:36 PM, Gil Tene <g...@azulsystems.com> wrote:

Monitor deflation is one of the other things to watch for. While it is "rare" to see applications that produce a high rates of newly inflated monitors per second, there are some common ones that do just that. E.g. Cassandra can generate 10s of thousands of newly inflated monitors per second, and I've seen some trading frameworks do this too. If you let this accumulate for many seconds without running a monitor deflation pass, "interesting things" start to happen. At best you start to see big pauses when millions of monitors get deflated in a single safepoint. At worst... Boom.

I think that relaxing the rate of forced safepoints can be "safe" if you apply enough testing (read "verify stable day-long runs, even if you just increased the interval to 5 minutes").

On Friday, August 7, 2015 at 9:25:25 AM UTC-7, Vitaly Davidovich wrote:

I had a couple of emails on this topic a few months back that you may find interesting:

http://openjdk.5641.n7.nabble.com/GuaranteedSafepointInterval-clarification-td228438.html
http://openjdk.5641.n7.nabble.com/InlineCacheBuffer-GuaranteedSafepointInterval-td229138.html

I think the short answer is it's safe to make it longer if you're ok with having slightly slower performance (one extra jump, AFAIU) until the IC jump is removed (at the next safepoint). This GSI is quite annoying, particularly because it ends up piggybacking additional housekeeping tasks if it actually enters the safepoint (and I think Java 8 has additional tasks, such as nmethod hotness marking).

On Fri, Aug 7, 2015 at 12:14 PM, Martin Thompson <mjp...@gmail.com> wrote:

Given Mark's nice post on guaranteed safepoints I wondered if any of our JVM friends would like to comment on what is a suitable default for GuaranteedSafepointInterval? Is it safe to make this many seconds?

http://epickrram.blogspot.co.uk/2015/08/jvm-guaranteed-safepoints.html

Regards,
Martin...

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Vitaly Davidovich

unread,

Aug 8, 2015, 12:55:18 PM8/8/15

to mechanical-sympathy

Yes, if a SP is triggered by GSI/ICBuffer then "no vm operation" is reported as the SP reason. I tried (but failed) to convince David Holmes that they should put something more informative in the log message as someone seeing this for first time will be confused (although a quick Google search should lead them to the answer).

sent from my phone

Kirk Pepperdine

unread,

Aug 8, 2015, 1:26:46 PM8/8/15

to mechanica...@googlegroups.com

On Aug 8, 2015, at 6:55 PM, Vitaly Davidovich <vit...@gmail.com> wrote:

Yes, if a SP is triggered by GSI/ICBuffer then "no vm operation" is reported as the SP reason. I tried (but failed) to convince David Holmes that they should put something more informative in the log message as someone seeing this for first time will be confused (although a quick Google search should lead them to the answer).

Well, it was confusing at first and yes, a google search does eventually yield answers but that’s not the point. More to the point, it would be useful to know what it’s doing. Again, it might be that one can adjust code to not rub the sore spot.

— Kirk

signature.asc

Vitaly Davidovich

unread,

Aug 8, 2015, 2:13:15 PM8/8/15

to mechanical-sympathy

Yes, it would be useful. One of these days I'm going to look into this a bit more; if anyone has a debug hotspot build readily available and wants to enable TraceIC for investigation, that'd be great. This issue is particularly frustrating when you already go out of your way to avoid runtime induced pauses (e.g. GC, biased lock revocations, etc) and yet get hit with these opaque ones. The hotspot JIT is both a blessing and a curse at times.

sent from my phone

Reply all

Reply to author

Forward