Avoiding Apache Ignite crashes & tuning JVM + Ignite

7 views
Skip to first unread message

Sean Horner

unread,
Sep 9, 2024, 7:01:24 PM9/9/24
to sakai-dev, sakai-pr...@apereo.org

Hello,


Our institution today had one of our Sakai 23 tomcat nodes shut down by Apache Ignite. Coincident with the ignite related errors (e.g., “TcpCommunicationSpi.error Failed to process selector key”) for the tomcat node that was shut down, catalina.out also exhibited preceding jvm-pause-detector-worker warnings like: 


09-Sep-2024 11:59:07.558 WARN [jvm-pause-detector-worker] o.a.i.i.IgniteKernal%sakai-plu-edu.warning Possible too long JVM pause: 517 milliseconds.


We don’t think that a network disruption affected the ignite connections among the 6 total tomcat nodes in the cluster for this incident. While I'm still studying the logs across nodes for this incident to assess root cause, I am also writing to solicit feedback regarding our runtime configuration for memory allocation, garbage collection, etc.


Our Sakai 23 cluster consists of the 6 tomcat nodes (plus a load balancer using httpd and a node hosting mariadb). Each cluster node is hosted on a separate virtual machine with 16 Gb of RAM, running Oracle Linux 8. The Java 11 and Tomcat 9.0.93 runtime for each tomcat node has the following JAVA_OPTS (CATALINA_OPTS):


export CATALINA_OPTS="-server -Xms6g -Xmx12g -Dcom.sun.management.jmxremote -Dfile.encoding=UTF-8  -Dhttp.agent=Sakai -Dhttps.protocols=TLSv1.2,TLSv1.3 -Djava.awt.headless=true -XX:+UseCompressedOops -Djava.util.Arrays.useLegacyMergeSort=true -Dorg.apache.jasper.compiler.Parser.STRICT_QUOTE_ESCAPING=false  -Dsakai.cookieName=SAKAI2SESSIONID -Dsakai.home=$CATALINA_HOME/sakai/ -Duser.timezone=US/Pacific -Duser.language=en -Duser.region=US -Dsun.lang.ClassLoader.allowArraySyntax=true -Xlog:gc -XX:+UseShenandoahGC -XX:+AlwaysPreTouch -Dspring.config.location=$CATALINA_HOME/sakai/sakai.properties -Duser.home=$CATALINA_HOME"


And the following snippet from ignite-components.xml defines the sizing we have allocated for the hibernate and spring regions for ignite:


    <bean id="org.sakaiproject.ignite.DataRegionConfiguration.SpringRegion"

          class="org.apache.ignite.configuration.DataRegionConfiguration">

        <property name="name" value="spring_region"/>

        <property name="initialSize" value="#{10L * 1024 * 1024}"/>

        <property name="maxSize" value="#{100L * 1024 * 1024}"/>

        <property name="pageEvictionMode" value="RANDOM_2_LRU"/>

        <property name="persistenceEnabled" value="false"/>

        <property name="metricsEnabled" value="true"/>

    </bean>


    <bean id="org.sakaiproject.ignite.DataRegionConfiguration.HibernateL2Region"

          class="org.apache.ignite.configuration.DataRegionConfiguration">

        <property name="name" value="hibernate_l2_region"/>

        <property name="initialSize" value="#{300L * 1024 * 1024}"/>

        <property name="maxSize" value="#{1024L * 1024 * 1024}"/>

        <property name="pageEvictionMode" value="RANDOM_2_LRU"/>

        <property name="persistenceEnabled" value="false"/>

        <property name="emptyPagesPoolSize" value="10000"/>

        <property name="metricsEnabled" value="true"/>

    </bean>

 

Any leads, insights, or tips would be appreciated. For example, are we granting what seems like enough memory? Are our garbage collection settings reasonable?


Thanks,

Sean


--
Sean Horner
Senior Web Developer

Information & Technology Services
Pacific Lutheran University
Tacoma, WA 98447
Pronouns: he/him

I am currently working remotely.
For quickest response for Sakai support questions, email sa...@plu.edu, or create a Help Desk ticket at https://helpdesk.plu.edu.
Reply all
Reply to author
Forward
0 new messages