search nodes crashing after upgrading Elassandra from 5.5.0.24 to 6.8.4.7

26 views
Skip to first unread message

Mitch Gitman

unread,
Sep 25, 2020, 11:50:46 PM9/25/20
to Elassandra User Mailing List

I have a keyspace with a small data set but which is doing some heavy-duty search queries against a deeply nested index definition. I've migrated this keyspace from a cluster with Elassandra 5.5.0.24 to one with 6.8.4.7. The number of nodes in the search datacenter and their specifications in the two clusters are the same. The index definitions haven't changed. The one other difference is that this small keyspace has to share the old cluster with a couple keyspaces with many times larger data sets whereas this keyspace has the new cluster all to itself.


So the only substantial difference, to the best of my knowledge, is the Elassandra version. But here's where the behavior differs:

  • In the new 6.8.4.7 cluster, the search nodes crash repeatedly, each at least once a day (where we've only had this up for a day). And this despite there being no traffic for other keyspaces.
  • In the old 5.5.0.24 cluster, the search nodes never crash.


This crashing is always about running out of memory, and typically about running out of heap space. The vanilla Cassandra nodes never crash on either cluster.


Here's how the heap usage will look after I've restarted some nodes:

# nodetool -u dba_admin -pw dba_admin -h west-search-01.foo.bar info|grep "^Heap Memory (MB)"

Heap Memory (MB)       : 1786.30 / 31584.00

# nodetool -u dba_admin -pw dba_admin -h west-search-02.foo.bar info|grep "^Heap Memory (MB)"

Heap Memory (MB)       : 1509.38 / 31584.00

# nodetool -u dba_admin -pw dba_admin -h west-search-03.foo.bar info|grep "^Heap Memory (MB)"

Heap Memory (MB)       : 1718.96 / 31584.00

# nodetool -u dba_admin -pw dba_admin -h west-search-04.foo.bar info|grep "^Heap Memory (MB)"

Heap Memory (MB)       : 1299.82 / 31584.00

# nodetool -u dba_admin -pw dba_admin -h west-search-05.foo.bar info|grep "^Heap Memory (MB)"

Heap Memory (MB)       : 921.51 / 31584.00

# nodetool -u dba_admin -pw dba_admin -h west-search-06.foo.bar info|grep "^Heap Memory (MB)"

Heap Memory (MB)       : 4570.36 / 31584.00


Here's how the heap usage will look after several hours of letting some of the nodes just run:

# nodetool -u dba_admin -pw dba_admin -h west-search-01.foo.bar info|grep "^Heap Memory (MB)"

Heap Memory (MB)       : 18099.37 / 31584.00

# nodetool -u dba_admin -pw dba_admin -h west-search-02.foo.bar info|grep "^Heap Memory (MB)"

Heap Memory (MB)       : 1840.71 / 31584.00

# nodetool -u dba_admin -pw dba_admin -h west-search-03.foo.bar info|grep "^Heap Memory (MB)"

Heap Memory (MB)       : 16959.97 / 31584.00

# nodetool -u dba_admin -pw dba_admin -h west-search-04.foo.bar info|grep "^Heap Memory (MB)"

Heap Memory (MB)       : 16003.95 / 31584.00

# nodetool -u dba_admin -pw dba_admin -h west-search-05.foo.bar info|grep "^Heap Memory (MB)"

Heap Memory (MB)       : 17072.32 / 31584.00

# nodetool -u dba_admin -pw dba_admin -h west-search-06.foo.bar info|grep "^Heap Memory (MB)"

Heap Memory (MB)       : 5375.93 / 31584.00


On the first node, note the increase from 1509.38MB to 18099.37MB. The typical pattern we've seen is that the heap usage just gradually goes up and up and up, over 50%, over 70%, until the server crashes. I haven't seen it go down except when the server crashes. We've produced some graphs over time of the heap usage, but I haven't analyzed those closely.


The RAM utilization has typically been really high, like 90%. There was one occasion when the RAM spiked to 100% and at the same time the heap usage went down sharply and then spiked up to 100% too before it crashed. This was an atypical incident.


Here's how the java command is being invoked with the classpath elided over:

java -Djava.library.path=/usr/share/cassandra/lib/sigar-bin -Xloggc:/var/log/cassandra/gc.log -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -XX:+HeapDumpOnOutOfMemoryError -XX:StringTableSize=1000003 -XX:+AlwaysPreTouch -XX:-UseBiasedLocking -XX:+UseTLAB -XX:+ResizeTLAB -XX:+UseNUMA -XX:+PerfDisableSharedMem -Djava.net.preferIPv4Stack=true -Xms31G -Xmx31G -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSWaitDuration=10000 -XX:+CMSParallelInitialMarkEnabled -XX:+CMSEdenChunksRecordAlways -XX:+CMSClassUnloadingEnabled -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintPromotionFailure -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j.skipJansi=true -Des.search_strategy_class=RackAwareSearchStrategy -Dcom.sun.management.jmxremote.access.file=/usr/lib/jvm/zulu-8/jre/lib/management/jmxremote.access -javaagent:/opt/prometheus/jmx_prometheus_javaagent-0.3.1.jar=9192:/opt/prometheus/elassandra-grafana.yml -Xmn1600M -XX:+UseCondCardMark -XX:CompileCommandFile=/etc/cassandra/hotspot_compiler -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar -Dcassandra.jmx.remote.port=7199 -Dcom.sun.management.jmxremote.rmi.port=7199 -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.password.file=/etc/cassandra/jmxremote.password -Djava.library.path=/usr/share/cassandra/lib/sigar-bin -Djdk.io.permissionsUseCanonicalPath=true -Des.distribution.flavor=oss -Des.distribution.type=rpm -Djava.awt.headless=true -XX:OnOutOfMemoryError=kill -9 %p -Dlogback.configurationFile=/etc/cassandra/logback.xml -Dcassandra.logdir=/var/log/cassandra -Dcassandra.storagedir=/var/lib/cassandra -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid -cp ... org.apache.cassandra.service.ElassandraDaemon


Note that I've temporarily enabled HeapDumpOnOutOfMemoryError, although I haven't checked the output of that yet.


Thanks in advance. I'll post an update if I can figure this out on my end.


P.S. Under other circumstances I would be asking this question with an Elassandra license for commercial support. I'd like to position my team to purchase a support contract, but we'd be in a better position to do so if we got past this blocker.

Mitch Gitman

unread,
Sep 26, 2020, 1:10:32 AM9/26/20
to Elassandra User Mailing List
BTW, I have the system.log file from when a node went down. The final messages relate to garbage collection:
2020-09-25 09:09:52,884 INFO  [Service Thread] GCInspector.java:285 handleNotification ParNew GC in 326ms.  CMS Old Gen: 21063422648 -> 21063807912; Par Eden Space: 1342177280 -> 0; Par Survivor Space: 65107968 -> 69567544
2020-09-25 09:10:00,978 INFO  [Service Thread] GCInspector.java:285 handleNotification ParNew GC in 245ms.  CMS Old Gen: 29662472760 -> 29669724800; Par Eden Space: 1342177280 -> 0; Par Survivor Space: 56301280 -> 26908192

Less than a minute prior, there are these messages:
2020-09-25 09:09:42,972 WARN  [Service Thread] GCInspector.java:283 handleNotification ConcurrentMarkSweep GC in 11448ms.  CMS Old Gen: 19310358288 -> 7901908984;
...
2020-09-25 09:09:44,289 WARN  [GossipTasks:1] FailureDetector.java:296 interpret Not marking nodes down due to local pause of 13052324300 > 5000000000

This suggests that something is happening during garbage collection.

So is there anything else interesting in the log file? The only actual ERROR logs I could find were from what appear to be some queries being made against some fields whose mappings have been set as:
"index": false

See:
2020-09-25 05:40:59,691 INFO  [elasticsearch[10.6.5.75][generic][T#1]] CassandraShardStateListener.java:69 afterIndexShardStarted shard [some_keyspace][0] started
2020-09-25 05:40:59,692 INFO  [elasticsearch[10.6.5.75][generic][T#1]] ElasticSecondaryIndex.java:2560 startRebuildIfNeeded start building secondary some_keyspace.some_table.elastic_some_table_idx
2020-09-25 05:40:59,851 ERROR [CompactionExecutor:4] ElasticSecondaryIndex.java:492 addField submapper not found for nested field [discountduenow] in index [some_keyspace] for metadata.version=2, not indexed
2020-09-25 05:40:59,852 ERROR [CompactionExecutor:4] ElasticSecondaryIndex.java:492 addField submapper not found for nested field [discountmonthly] in index [some_keyspace] for metadata.version=2, not indexed
2020-09-25 05:41:01,387 ERROR [CompactionExecutor:4] ElasticSecondaryIndex.java:492 addField submapper not found for nested field [discountduenow] in index [some_keyspace] for metadata.version=2, not indexed
2020-09-25 05:41:01,387 ERROR [CompactionExecutor:4] ElasticSecondaryIndex.java:492 addField submapper not found for nested field [discountmonthly] in index [some_keyspace] for metadata.version=2, not indexed
2020-09-25 05:41:01,447 ERROR [CompactionExecutor:4] ElasticSecondaryIndex.java:492 addField submapper not found for nested field [discountduenow] in index [some_keyspace] for metadata.version=2, not indexed
2020-09-25 05:41:01,448 ERROR [CompactionExecutor:4] ElasticSecondaryIndex.java:492 addField submapper not found for nested field [discountmonthly] in index [some_keyspace] for metadata.version=2, not indexed
2020-09-25 05:41:01,881 ERROR [CompactionExecutor:4] ElasticSecondaryIndex.java:492 addField submapper not found for nested field [discountduenow] in index [some_keyspace] for metadata.version=2, not indexed
2020-09-25 05:41:01,881 ERROR [CompactionExecutor:4] ElasticSecondaryIndex.java:492 addField submapper not found for nested field [discountmonthly] in index [some_keyspace] for metadata.version=2, not indexed
2020-09-25 05:41:02,333 ERROR [CompactionExecutor:4] ElasticSecondaryIndex.java:492 addField submapper not found for nested field [discountduenow] in index [some_keyspace] for metadata.version=2, not indexed
2020-09-25 05:41:02,334 ERROR [CompactionExecutor:4] ElasticSecondaryIndex.java:492 addField submapper not found for nested field [discountmonthly] in index [some_keyspace] for metadata.version=2, not indexed
2020-09-25 05:41:02,821 ERROR [CompactionExecutor:4] ElasticSecondaryIndex.java:492 addField submapper not found for nested field [discountduenow] in index [some_keyspace] for metadata.version=2, not indexed
2020-09-25 05:41:02,821 ERROR [CompactionExecutor:4] ElasticSecondaryIndex.java:492 addField submapper not found for nested field [discountmonthly] in index [some_keyspace] for metadata.version=2, not indexed
2020-09-25 05:41:32,966 INFO  [CompactionExecutor:4] NoSpamLogger.java:91 log Maximum memory usage reached (536870912), cannot allocate chunk of 1048576

Dinesh Kolappa

unread,
Oct 26, 2020, 8:09:56 PM10/26/20
to Elassandra User Mailing List
Did you resolve the issue? We are seeing the same errors and issues after upgrading to elassandra 6.8.3

Mitch Gitman

unread,
Oct 26, 2020, 8:46:58 PM10/26/20
to Dinesh Kolappa, Elassandra User Mailing List
Dinesh, let me get back with more details later this week once we've migrated our third and final keyspace over to its own dedicated 6.8.4.7 cluster.

So far we've found that the search nodes for the two single-keyspace clusters we've migrated are prone to crashing, and this was despite doubling the memory from 32GB to 64GB and doubling the heap size from 16GB to 32GB accordingly. Plus, since these were AWS EC2 instances, we upgraded the volume containing the /var/ directory from gp2 to either io1 or io2 Provisioned IOPS, depending on the AWS availability zone. The crashes are especially prone to happen during or after repairs, which is making for some really, truly miserable weekends for us.

The difference is that these two keyspaces are making heavy-duty use of Elasticsearch indexing and possibly doing some other really excessive things on the write path. The third keyspace, even though it has much, much more data, is making much more conservative use of Elasticsearch; same for its writes. If that keyspace has problems as well, then we'll know on our end there's something really seriously wrong with Elassandra 6.8.4 vs. Elassandra 5.5.0.

Sometimes I'm wondering if, with Elassandra, we're just bumping up against the limits of secondary indexes. By the same token, even though my evidence is not definitive yet, it does appear there's a greater fragility in the search nodes on 6.8.4.7 than on 5.5.0.24.


--
You received this message because you are subscribed to the Google Groups "Elassandra User Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elassandra+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elassandra/0e4a8514-34e2-450e-ba97-d9073c2a1ea4n%40googlegroups.com.

Dinesh Kolappa

unread,
Oct 27, 2020, 1:36:58 AM10/27/20
to Elassandra User Mailing List
Hi Mitch, Thanks for your prompt response. We have a single keyspace cluster that was previously running on Elassandra 5.5.0.24. We switched over to an i3 SSD EC2 instance with 16 cpu and 122 GB memory. Our heap size is 30.5 GB. We have some heavy duty elastic search queries and it performed pretty well in the old cluster. In the newly built cluster, the heap gets filled up when we run our usual spark jobs and some of the elastic search queries (under light load) and then the nodes start going down. So frustrating!!! Please let me know if you find anything to fix this issue.
Reply all
Reply to author
Forward
0 new messages