Historical Memory issue

306 views
Skip to first unread message

TechLifeWithMohsin

unread,
Jul 6, 2021, 7:47:07 AM7/6/21
to Druid User

Hi team,

I am facing issue with Historical process filling up /buffer/cache on EC2 with growing number of segments.

Here is my config:-
druid.service=druid/historical
druid.plaintextPort=8083

# HTTP server threads
druid.server.http.numThreads=100

# Processing threads and buffers
druid.processing.buffer.sizeBytes=500000000
druid.processing.numMergeBuffers=4
druid.processing.numThreads=15
druid.processing.tmpDir=/mnt/disk2/var/druid/processing
druid.segmentCache.numLoadingThreads=50

# Segment storage
druid.segmentCache.locations=[{"path":"/mnt/disk2/var/druid/druidSegments", "maxSize": 6500000000000},{"path":"/mnt/disk3/var/druid/druidSegments", "maxSize": 6500000000000},{"path":"/mnt/disk4/var/druid/druidSegments", "maxSize": 6500000000000},{"path":"/mnt/disk5/var/druid/druidSegments", "maxSize": 6500000000000},{"path":"/mnt/disk6/var/druid/druidSegments", "maxSize": 6500000000000},{"path":"/mnt/disk7/var/druid/druidSegments", "maxSize": 6500000000000},{"path":"/mnt/disk8/var/druid/druidSegments", "maxSize": 6500000000000},{"path":"/mnt/disk9/var/druid/druidSegments", "maxSize": 6500000000000}]
druid.server.maxSize=50000000000000

# Query cache
druid.historical.cache.useCache=true
druid.historical.cache.populateCache=true
druid.cache.type=memcached
druid.cache.sizeInBytes=256000000
druid.cache.hosts=<host>:11211

but when I am looking at free memory:-
 free -m -h
              total        used        free      shared  buff/cache   available
Mem:           747G         21G        709G        868K         17G        722G
Swap:            0B          0B          0B

The buff/cache  value is keep growing, and after certain period it fills up my whole memory and historical server fails.

Can someone help what am I missing here?

Thanks,

vijay narayanan

unread,
Jul 6, 2021, 8:09:23 AM7/6/21
to druid...@googlegroups.com
can you get the error from historical log? Druid does not load segments into memory until there is a query. Segments are memory mapped and loaded into page cache when executing a query. Only the segments accessed by the query get loaded into page cache

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/ca3fe34e-9fa0-4ed1-9eec-3bc9fdabb1f7n%40googlegroups.com.

TechLifeWithMohsin

unread,
Jul 6, 2021, 8:36:57 AM7/6/21
to Druid User
After certain segments load like around 65k+, historical thread fail saying not enough memory for JVM to run

TechLifeWithMohsin

unread,
Jul 6, 2021, 8:44:48 AM7/6/21
to Druid User
I am observing with new historical nodes the buff/cache size is keep on increasing.
 total        used        free      shared  buff/cache   available
Mem:           747G         22G        706G        868K         19G        721G
Swap:            0B          0B          0B

vijay narayanan

unread,
Jul 6, 2021, 8:49:02 AM7/6/21
to druid...@googlegroups.com
you need to set a larger heap. What is your current jvm heap?

TechLifeWithMohsin

unread,
Jul 6, 2021, 8:50:55 AM7/6/21
to Druid User
Its 30 gb, I doubt its the heap issue, it should have failed earlier.. and still why buff/cache is filling up ?

vijay narayanan

unread,
Jul 6, 2021, 9:17:37 AM7/6/21
to druid...@googlegroups.com
65k+ seems like it is reaching the limit on open files. What is your ulimit on open files?.  Set the ulimit on open files to a higher value. Also once the historicals run a compaction task to get segments size to around 500MB

vijay

TechLifeWithMohsin

unread,
Jul 6, 2021, 9:31:12 AM7/6/21
to Druid User
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 30446
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65535
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

when I saw the failed intance, buff/cache was 710 GB and thats the failure

vijay narayanan

unread,
Jul 6, 2021, 9:49:33 AM7/6/21
to druid...@googlegroups.com
I am not so sure…you are open files limit is at 65535. That means that when 65k segments are loaded this limit will be reached and the historical will go down. You want to make open files to unlimited and try

Joseph Mocker

unread,
Jul 6, 2021, 10:53:52 AM7/6/21
to druid...@googlegroups.com

FWIW I don't think buff/cache is a useful metric to be looking at. Memory that is allocated there can generally be freed and used for other things when the system is under memory pressure.

https://unix.stackexchange.com/questions/390518/what-do-the-buff-cache-and-avail-mem-fields-in-top-mean

 --joe

TechLifeWithMohsin

unread,
Jul 6, 2021, 2:24:40 PM7/6/21
to Druid User
After the research I am also thinking the same buffer cache should not be the cause, I've increased the nofile limit to 200k, And ingestion is running, will see how that goes.

TechLifeWithMohsin

unread,
Jul 7, 2021, 2:33:36 AM7/7/21
to Druid User
I am getting this error:-
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f2f9ec20000, 65536, 1) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 65536 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /home/ec2-user/apache-druid-0.19.0/hs_err_pid46466.log
#
# Compiler replay data is saved as:
# /home/ec2-user/apache-druid-0.19.0/replay_pid46466.log

My jvm settings are:-
-server
-Xms300g
-Xmx300g
-Daws.region=us-east-1
-XX:MaxDirectMemorySize=300g
-Dcom.sun.xml.bind.v2.bytecode.ClassTailor.noOptimize=true
-XX:MaxMetaspaceSize=100g
-XX:+ExitOnOutOfMemoryError
-Duser.timezone=UTC
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=/mnt/disk2/var/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

vijay narayanan

unread,
Jul 7, 2021, 2:56:22 AM7/7/21
to druid...@googlegroups.com
this can happen if you have exceeded the max memory mapped files (look at https://druid.apache.org/docs/latest/operations/basic-cluster-tuning.html#system-configuration)

Increase the memory files to a larger number

TechLifeWithMohsin

unread,
Jul 7, 2021, 3:03:53 AM7/7/21
to Druid User
And it seems to work now.
Reply all
Reply to author
Forward
0 new messages