Historical nodes crashing

733 views
Skip to first unread message

Rajesh MK

unread,
Jun 3, 2021, 9:56:19 AM6/3/21
to druid...@googlegroups.com
Hi team,

         I have a 4 node cluster (2 datanodes ,1 Master,1 query node) and  historical process is crashing repeatedly on both data nodes with out of  memory heap error even after increasing the memory heap multiple times .This eventually leading to zookeeper cluster data corruption and i had to remove the zookeeper data to fix it..This cluster was running fine for around 3 weeks .Is the increased load causing this issue ? is there a way to fix this?

We are using kafka native ingestion.

Error from historical log

2021-06-03T09:57:04,141 INFO [Announcer-0] org.apache.druid.curator.announcement.Announcer - Reinstating [/druid/segments/drulx1002:8083/drulx1002:8083_historical__default_tier_2021-06-03T09:09:46.124Z_e0
a8eab478ff4b5cbdb61cfacf4ea9f42061]
Terminating due to java.lang.OutOfMemoryError: Java heap space
2021-06-03T09:57:33,503 INFO [main] org.hibernate.validator.internal.util.Version - HV000001: Hibernate Validator 5.2.5.Final

Total RAM 64 GB

[root@drulx1002 ~]# cat  /druid/apache-druid-0.20.1/conf/druid/cluster/data/historical/jvm.config
-server
-Xms13g
-Xmx16g
-XX:MaxDirectMemorySize=24g
-XX:+ExitOnOutOfMemoryError
-Duser.timezone=UTC
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=var/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager


Please kindly let me know your thoughts on this
Regards
Rajesh

Stelios Philippou

unread,
Jun 3, 2021, 10:37:56 AM6/3/21
to druid...@googlegroups.com
Hi Rajesh,

We had just faced the same issue, as the system was running perfectly for a couple of months.

One thing that we have tried was to increase the zookepers to 3 to help out.

But that did not work out on our part.

Our issue was more extensive as the i-o ingestion was too aggressive and we ended up with creating way too many small files.
Druid likes to have 600-700mb files.

So perhaps you ended up having too many segments for your system and thus running our of memory when trying to bring it up online every time.

We ended up losing those data but we have since changed the ingestion into to collect the data as they are more appropriate in our part with segmentGranulality and intermediatePersistPeriod to not be very aggressive.





--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/CANFXxYRVHETSG31idJw_9oQKJaP1mtjkruuar1M0NRAfT51acw%40mail.gmail.com.

Rajesh MK

unread,
Jun 4, 2021, 2:21:27 AM6/4/21
to druid...@googlegroups.com
Hi Stelios,

          Thank you for the quick response. We have another lingering issue where only one historical process is always running out of two data nodes,Is there any additional setting we need to do in load balancing while adding a second data node?

Regards
Rajesh 

Max Gorinevsky

unread,
Jun 4, 2021, 2:40:02 AM6/4/21
to druid...@googlegroups.com
Hi Rajesh,

Druid is definitely able to have multiple data nodes, each with a historical (and middlemanager) process. What happens when you start the historical process on the second node? Are there any errors in the historical logs?

Thanks,
Max



--
Thanks,

Max Gorinevsky
Imply Support

Rajesh MK

unread,
Jun 4, 2021, 4:58:14 AM6/4/21
to druid...@googlegroups.com
Hi Max,

         Even though both nodes are having the same  hardware spec and java config  ,one of the nodes fails with a java heap space error.

drulx1001:8081
drulx1001
8081 (plain)
drulx1001:8081
drulx1001
8081 (plain)
drulx1003:8888
drulx1003
8888 (plain)
drulx1003:8082
drulx1003
8082 (plain)
drulx1010:8083
drulx1010
8083 (plain)
37.54 GB
300.00 GB
12.5%
Empty load/drop queues
drulx1010:8091
drulx1010
8091 (plain)
2 / 4 (slots)
Last completed task: 2021-06-04T08:44:27.501Z
drulx1002:8091
drulx1002
8091 (plain)
4 / 6 (slots)
Last completed task: 2021-06-04T08:39:59.531Z
drulx1010:8102
drulx1010
8102 (plain)
drulx1010:8100
drulx1010
8100 (plain)
drulx1002:8103
drulx1002
8103 (plain)
drulx1002:8102
drulx1002
8102 (plain)
drulx1002:8101
drulx1002
8101 (plain)
drulx1002:8100
drulx1002
8100 (plain)


a8eab478ff4b5cbdb61cfacf4ea9f42061]
Terminating due to java.lang.OutOfMemoryError: Java heap space
2021-06-03T09:57:33,503 INFO [main] org.hibernate.validator.internal.util.Version - HV000001: Hibernate Validator 5.2.5.Final

Regards
Rajesh

Max Gorinevsky

unread,
Jun 4, 2021, 5:38:25 AM6/4/21
to druid...@googlegroups.com
Hi Rajesh,

If this happens right away, it could be that the node tries to cache lookups into memory and fails. It is strange that the other node does not have the issue.
What are the specs of the node and how much heap is allocated to the historical process? What is the size of any lookups?

Thanks,
Max

Rajesh MK

unread,
Jun 4, 2021, 8:22:34 AM6/4/21
to druid...@googlegroups.com
Hi Max,

          Each data node has 64GB RAM and 16 CPU cores.Below is the JVM config for historic nodes


cat /druid/apache-druid-0.20.1/conf/druid/cluster/data/historical/jvm.config
-server
-Xms16g
-Xmx19g
-XX:MaxDirectMemorySize=27g

-XX:+ExitOnOutOfMemoryError
-Duser.timezone=UTC
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=var/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

Sometimes below warning appears,is this something should be worried ?

2021-06-04T12:10:09,296 WARN [main-SendThread(zoolx1003:2181)] org.apache.zookeeper.ClientCnxn - Session 0x200a170f6bc003f for server zoolx1003/10.59.108.225:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Packet len34829732 is out of range!
at org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:113) ~[zookeeper-3.4.14.jar:3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf]
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:79) ~[zookeeper-3.4.14.jar:3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf]
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) ~[zookeeper-3.4.14.jar:3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf]
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) [zookeeper-3.4.14.jar:3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf]
2021-06-04T12:10:09,397 INFO [main-EventThread] org.apache.curator.framework.state.ConnectionStateManager - State change: SUSPENDED
2021-06-04T12:10:09,397 INFO [ZkCoordinator] org.apache.druid.server.coordination.ZkCoordinator - Ignoring event[PathChildrenCacheEvent{type=CONNECTION_SUSPENDED, data=null}]
2021-06-04T12:10:09,397 WARN [NodeRoleWatcher[COORDINATOR]] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider$NodeRoleWatcher - Ignored event type[CONNECTION_SUSPENDED] for node watcher of role[coordinator].

Regards
Rajesh

Diego Lucas Jiménez

unread,
Jun 7, 2021, 4:54:46 AM6/7/21
to Druid User
Interesting.
We are facing the same problem, from time to time one or two Historicals goes down (not the service, it keeps running) from the cluster, the Historical starts screaming a similar message ("org.apache.druid.curator.announcement.Announcer - Reinstating ...") without the Out of Heap and the Zookeeper also the same "org.apache.zookeeper.ClientCnxn - Session xxx for server xyz/IP:PORT, unexpected error, closing socket connection and attempting reconnect" than Rajesh is having.

Difference on my side is that we have 20 Historicals, 2 Coord, 2 Overlords, 25 Middlemanager, 3 ZK, 2 Brokers and 2 Routers, each of them on a dedicated machine, with special highlight over the Historicals (64vCPUs, dedicated NVME disks, 384Gb RAM).

Happens several times per day, a Historical is "kicked out" of the cluster by ZK, then comes back, all while the Historical logs are all the time like "yeah trying to talk to ZK but he hates me".

Wonder if it's a Druid 0.20.x bug, it never happened before to us.
What version of Druid do you have Rajesh?

Samarth Jain

unread,
Jun 7, 2021, 1:50:36 PM6/7/21
to druid...@googlegroups.com
Diego,

We have seen this happen in the past when the historical nodes are going through "stop the world" gc cycles. What does the gc activity look like for your historical process? Wondering if you need to do some tuning there.

Rajesh MK

unread,
Jun 8, 2021, 1:48:03 AM6/8/21
to druid...@googlegroups.com
Thank you for the inputs. Currently we are using the version 0.20.1 and planning to upgrade it to version 0.21.0.

Regards
Rajesh

Diego Lucas Jiménez

unread,
Jun 14, 2021, 5:06:40 AM6/14/21
to Druid User
@samart you were actually right. Stop the world gc was the cause. In one tier of our servers increasing heap fixed the problem, but the other tier already has 24G of heap.
Should we try to increase even further? We're using G1GC already, hardware can't be better... any ideas on how to tune that heap?

Joseph Mocker

unread,
Jun 14, 2021, 11:29:20 AM6/14/21
to druid...@googlegroups.com

You mention a few times in this thread that you see OutOfMemoryErrors in the logs. To me, this suggests that your heap is not large enough and/or there is a memory leak somewhere. This is probably the cause of long stop-the-world cycles as the GC is continually scouring memory to free up enough for what it needs but can't cause it is near full. You can confirm that by turning on some GC logging or connecting JVisualVM/JConsole to it and watch heap allocation over time.

I didn't see in your config settings below that you are using G1GC but you may have added that later.

Good luck!

  --joe

Max Gorinevsky

unread,
Jun 15, 2021, 4:57:40 AM6/15/21
to druid...@googlegroups.com
Hi Diego,

The historical heap is used to store the following:

-Lookups
-Unmerged query results
-Per-segment and per-column information

  Very large lookups can use a lot of heap. Complex queries covering a large interval can use require a lot of heap.  

Typically, the last of these is not a big factor, as druid stores maybe a few KB per segment and a few hundred bytes per segment-column in memory. However, if you have a lot of segments or very many columns per segment, this can add up. For example, with 100k segments and 1,000 columns per segment, you'd need about ~10GB of heap for this alone.


If you are encountering OOMs, you will need to figure out where they are happening; there should be clues in the stack trace. But, most likely, you will need to increase the heap available to the historical process.

Thanks,
Max


Ben Krug

unread,
Jun 15, 2021, 4:57:40 AM6/15/21
to druid...@googlegroups.com
This is more of a G1GC answer than a druid answer, but with that much RAM, you might try 31G heap.  That's a sweet spot for G1GC.  (31G is better than 32G - eg, see here).

Reply all
Reply to author
Forward
Message has been deleted
0 new messages