CMS: abort preclean due to time

987 views
Skip to first unread message

Austin

unread,
May 3, 2017, 5:02:47 PM5/3/17
to sakai-dev
Hello Sakai Devs,

Does anyone have experience with Garbage Collection issues where the concurrent mark sweep seems to get into a loop?

We recently added some heap and GC settings to our config

-Xmn2500m
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:CMSInitiatingOccupancyFraction=80
-XX:+DisableExplicitGC
-XX:+DoEscapeAnalysis


Which have been working great as far as our servers have not randomly crashed since adding those options.  However, I have noticed in our logs that occasionally, the Concurrent Mark Sweep will get into a loop where it seems like it's running every second.  I don't think it affects users because no one has complained about performance during one of these CMS loops, but it does seem to spike the CPU a bit and goes on until we reboot the server, which fortunately we do every night for backups.

One thing I'm considering is increasing CMSMaxAbortablePrecleanTime

https://blogs.oracle.com/jonthecollector/did-you-know

as it does look like we are just barely going over the default 5s for the preclean.  So I'm thinking increasing the preclean time to 6 or 7 seconds?  Anyone have experience with that?

Thanks,

Austin

Here's an example our logs:

 CMS: abort preclean due to time 69143.845: [CMS-concurrent-abortable-preclean: 5.057/5.156 secs] [Times: user=6.51 sys=0.12, real=5.16 secs]
69143.847: [GC[YG occupancy: 333051 K (2304000 K)]69143.847: [Rescan (parallel) , 0.0412780 secs]69143.888: [weak refs processing, 0.0001760 secs]69143.889: [scrub string table, 0.0042310 secs] [1 CMS-remark: 6783593K(7966720K)] 7116645K(10270720K), 0.0458210 secs] [Times: user=0.25 sys=0.00, real=0.05 secs]
69143.893: [CMS-concurrent-sweep-start]
69144.399: [CMS-concurrent-sweep: 0.506/0.506 secs] [Times: user=0.70 sys=0.01, real=0.50 secs]
69144.399: [CMS-concurrent-reset-start]
69144.415: [CMS-concurrent-reset: 0.015/0.015 secs] [Times: user=0.02 sys=0.00, real=0.02 secs]
.
.
.
69146.418: [GC [1 CMS-initial-mark: 6782820K(7966720K)] 7218964K(10270720K), 0.2087300 secs] [Times: user=0.21 sys=0.00, real=0.21 secs]
69146.627: [CMS-concurrent-mark-start]
69147.978: [CMS-concurrent-mark: 1.351/1.351 secs] [Times: user=2.93 sys=0.03, real=1.35 secs]
69147.978: [CMS-concurrent-preclean-start]
69147.995: [CMS-concurrent-preclean: 0.017/0.017 secs] [Times: user=0.02 sys=0.00, real=0.02 secs]
69147.995: [CMS-concurrent-abortable-preclean-start]
.
.
.
 CMS: abort preclean due to time 69153.081: [CMS-concurrent-abortable-preclean: 5.084/5.086 secs] [Times: user=5.84 sys=0.09, real=5.08 secs]
69153.083: [GC[YG occupancy: 715581 K (2304000 K)]69153.083: [Rescan (parallel) , 0.3252150 secs]69153.409: [weak refs processing, 0.0008580 secs]69153.410: [scrub string table, 0.0044170 secs] [1 CMS-remark: 6782820K(7966720K)] 7498402K(10270720K), 0.3306480 secs] [Times: user=1.87 sys=0.02, real=0.34 secs]
69153.414: [CMS-concurrent-sweep-start]
69153.954: [CMS-concurrent-sweep: 0.540/0.540 secs] [Times: user=0.59 sys=0.01, real=0.54 secs]
69153.954: [CMS-concurrent-reset-start]
69153.970: [CMS-concurrent-reset: 0.015/0.015 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
.
.
.
69155.973: [GC [1 CMS-initial-mark: 6763068K(7966720K)] 7651204K(10270720K), 0.4351060 secs] [Times: user=0.44 sys=0.00, real=0.44 secs]
69156.408: [CMS-concurrent-mark-start]
69157.764: [CMS-concurrent-mark: 1.356/1.356 secs] [Times: user=2.98 sys=0.04, real=1.35 secs]
69157.764: [CMS-concurrent-preclean-start]
69157.791: [CMS-concurrent-preclean: 0.026/0.027 secs] [Times: user=0.03 sys=0.00, real=0.03 secs]
69157.791: [CMS-concurrent-abortable-preclean-start]



Earle Nietzel

unread,
May 5, 2017, 10:57:58 AM5/5/17
to Austin, sakai-dev
Hi Austin,

You should really post all the options your using as its hard to put into context the options your using without knowing the others.

Without knowing what Xms is its hard to know if Xmn is set correctly? the default is 1/4 the total heap but on larger heaps this can be to high. I think Sakai's does well when new is around 1G which that would be the default where Xmx=4g which are probably what most installations run with so no need tune if running a 4g heap. The key here is to make Young large enough so your not performing minor collections back to back and collection times for Young are fast typically milliseconds. Also you need leave enough room in tenured which is where most of long lived objects will live, for example this is where you will find the sakai caches.

Here are some tips:

1) Break up your configuration options into different lines so that you can more easily test different options
2) document options you have tried and their results

Here is an example of a setenv.sh

JAVA_OPTS="-server -Djava.awt.headless=true -Dhttp.agent=Sakai"

# Set memory options, this sets young to 1/8 the size of tenured where the default is 1/4. Don't make Young generation to large or to small it should be small enough that its collected fast and leave the majority of heap to tenured. Typically no need to adjust survivor.
JAVA_OPTS="$JAVA_OPTS -Xms8g -Xmx8g -Xmn=1g"

# Use compressed oops to use 32bit pointers 
JAVA_OPTS="$JAVA_OPTS XX:+UseCompressedOops -XX:+DisableExplicitGC"

# Use Concurrent Mark Sweep collector
JAVA_OPTS="$JAVA_OPTS -XX:+UseConcMarkSweepGC"

# when to start collecting tenured
JAVA_OPTS="$JAVA_OPTS -XX:CMSInitiatingOccupancyFraction=70"

# Set Sakai's Cookie
#JAVA_OPTS="$JAVA_OPTS -Dsakai.cookieName=SAKAI2SESSIONID"

# Enable Sakai Demo for testing
#JAVA_OPTS="$JAVA_OPTS -Dsakai.demo=true"

# Uncomment the following line if using JDK 1.6 SAK-16745
#JAVA_OPTS="$JAVA_OPTS -Dsun.lang.ClassLoader.allowArraySyntax=true"

# Uncomment the following line if using JDK 1.7 SAK-21265
#JAVA_OPTS="$JAVA_OPTS -Dorg.apache.jasper.compiler.Parser.STRICT_WHITESPACE=false"

# Still need this
JAVA_OPTS="$JAVA_OPTS -Dorg.apache.jasper.compiler.Parser.STRICT_QUOTE_ESCAPING=false"

# Use legacy merge sort (not needed for Java 8)
#JAVA_OPTS="$JAVA_OPTS -Djava.util.Arrays.useLegacyMergeSort=true"

# Sensitive sakai config should go here and only be accessible to the user sakai runs as
JAVA_OPTS="$JAVA_OPTS -Dsakai.security=$CATALINA_HOME/sakai/"

# uncomment the following line and enter the timezone if different than server's
# some examples for the U.S.:  US/Eastern, US/Central, US/Mountain, US/Pacific, US/Arizona
JAVA_OPTS="$JAVA_OPTS -Duser.timezone=US/Eastern"

# for debugging--print garbage collection info
#JAVA_OPTS="$JAVA_OPTS -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails"

# do these tasks when sakai is starting, like configuring JMX
if [[ "$1" == "start" ]]; then
  echo "Sakai Starting"
  JAVA_OPTS="$JAVA_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=8089 -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false"
fi



--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+...@apereo.org.
To post to this group, send email to saka...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.

Leonardo Canessa

unread,
May 5, 2017, 11:11:31 AM5/5/17
to Earle Nietzel, Austin, sakai-dev
# Use legacy merge sort (not needed for Java 8)
#JAVA_OPTS="$JAVA_OPTS -Djava.util.Arrays.useLegacyMergeSort=true"

As an aside, I would argue that using Legacy Merge Sort is necessary for Java 8 due to the following:
especially on installations which have been using Sakai since before 11.0.

Leonardo Canessa
Web Developer - E-Learning

To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+unsubscribe@apereo.org.

To post to this group, send email to saka...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.

--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+unsubscribe@apereo.org.

Austin

unread,
May 5, 2017, 3:08:40 PM5/5/17
to Leonardo Canessa, Earle Nietzel, sakai-dev
Hello Earle, Thanks for the reply!

The options we're using are listed below.   Our Xmn is about 24.3% of our Xmx, but you mentioned that for larger heaps that that could be too high?  We decided on those numbers by copying the settings from an old doc:

https://confluence.sakaiproject.org/display/PROD/Rutgers+performance+tuning+experience

where they were using about 23.8%.

I also found another forum that talks about the "abort preclean due to time" issue.  It also mentioned increasing CMSMaxAbortablePrecleanTime.  But it also suggests setting -XX:-PrintTenuringDistribution to check for "premature promotion from the young generation to the old one".  So we may try that setting first to see if that catches anything.  Unless you see any obvious problems with our settings or if anyone thinks upping the CMSMaxAbortablePrecleanTime is a good idea.



Sakai 10.7
Tomcat 7.0.76
jdk1.7.0_131

----------------

-server
-Xmx10280m
-Xms10280m
-XX:PermSize=512m
-XX:MaxPermSize=512m

-Xmn2500m
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:CMSInitiatingOccupancyFraction=80
-XX:+DisableExplicitGC
-XX:+DoEscapeAnalysis
-Dhttp.agent=Sakai
-Dorg.apache.jasper.compiler.Parser.STRICT_QUOTE_ESCAPING=false
-Djava.awt.headless=true
-Dcom.sun.management.jmxremote
-Dsun.lang.ClassLoader.allowArraySyntax=true
-verbose:gc
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps





Earle Nietzel

unread,
May 5, 2017, 3:54:21 PM5/5/17
to Austin, Leonardo Canessa, sakai-dev
My feeling is that a Young generation of 2.5G is pretty large and I would expect your Young GC's to take longer as a result. The preclean phase is sampling the objects in the Young generation so a larger Young generation means potentially longer times in general. A lot of this really depends on your hardware so if your seeing high collection times (>= 1 second) then the Young generation is probably to large for the hardware your on. The fact that preclean never finishes is probably just a symptom of that.

The important thing for the Young Generation is to collect as fast as it can, but at the same time you don't want it to small where its always collecting, what this means is you need to monitor it. But I have seen the best results around 1G for sakai. 

To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+...@apereo.org.

To post to this group, send email to saka...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.

--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+...@apereo.org.

Austin

unread,
May 5, 2017, 4:08:56 PM5/5/17
to Earle Nietzel, Leonardo Canessa, sakai-dev
Thanks Earle,  We'll take a look at adjusting and monitoring Xmn before we try the other options.

To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+unsubscribe@apereo.org.

To post to this group, send email to saka...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.

--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+unsubscribe@apereo.org.
Reply all
Reply to author
Forward
0 new messages