Historical Memory issue

TechLifeWithMohsin

unread,

Jul 6, 2021, 7:47:07 AM7/6/21

to Druid User

Hi team,

I am facing issue with Historical process filling up /buffer/cache on EC2 with growing number of segments.

Here is my config:-

druid.service=druid/historical
druid.plaintextPort=8083

# HTTP server threads
druid.server.http.numThreads=100

# Processing threads and buffers
druid.processing.buffer.sizeBytes=500000000
druid.processing.numMergeBuffers=4
druid.processing.numThreads=15
druid.processing.tmpDir=/mnt/disk2/var/druid/processing
druid.segmentCache.numLoadingThreads=50

# Segment storage
druid.segmentCache.locations=[{"path":"/mnt/disk2/var/druid/druidSegments", "maxSize": 6500000000000},{"path":"/mnt/disk3/var/druid/druidSegments", "maxSize": 6500000000000},{"path":"/mnt/disk4/var/druid/druidSegments", "maxSize": 6500000000000},{"path":"/mnt/disk5/var/druid/druidSegments", "maxSize": 6500000000000},{"path":"/mnt/disk6/var/druid/druidSegments", "maxSize": 6500000000000},{"path":"/mnt/disk7/var/druid/druidSegments", "maxSize": 6500000000000},{"path":"/mnt/disk8/var/druid/druidSegments", "maxSize": 6500000000000},{"path":"/mnt/disk9/var/druid/druidSegments", "maxSize": 6500000000000}]
druid.server.maxSize=50000000000000

# Query cache
druid.historical.cache.useCache=true
druid.historical.cache.populateCache=true
druid.cache.type=memcached
druid.cache.sizeInBytes=256000000
druid.cache.hosts=<host>:11211

but when I am looking at free memory:-

free -m -h
              total        used        free      shared buff/cache   available
Mem:           747G         21G        709G        868K         17G        722G
Swap:            0B          0B          0B

The buff/cache value is keep growing, and after certain period it fills up my whole memory and historical server fails.

Can someone help what am I missing here?

Thanks,

vijay narayanan

unread,

Jul 6, 2021, 8:09:23 AM7/6/21

to druid...@googlegroups.com

can you get the error from historical log? Druid does not load segments into memory until there is a query. Segments are memory mapped and loaded into page cache when executing a query. Only the segments accessed by the query get loaded into page cache

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/ca3fe34e-9fa0-4ed1-9eec-3bc9fdabb1f7n%40googlegroups.com.

TechLifeWithMohsin

unread,

Jul 6, 2021, 8:36:57 AM7/6/21

to Druid User

After certain segments load like around 65k+, historical thread fail saying not enough memory for JVM to run

TechLifeWithMohsin

unread,

Jul 6, 2021, 8:44:48 AM7/6/21

to Druid User

I am observing with new historical nodes the buff/cache size is keep on increasing.

total used free shared buff/cache available

Mem: 747G 22G 706G 868K 19G 721G
Swap: 0B 0B 0B

vijay narayanan

unread,

Jul 6, 2021, 8:49:02 AM7/6/21

to druid...@googlegroups.com

you need to set a larger heap. What is your current jvm heap?

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/012f0b35-7734-43c7-9868-c08eed1bd39an%40googlegroups.com.

TechLifeWithMohsin

unread,

Jul 6, 2021, 8:50:55 AM7/6/21

to Druid User

Its 30 gb, I doubt its the heap issue, it should have failed earlier.. and still why buff/cache is filling up ?

vijay narayanan

unread,

Jul 6, 2021, 9:17:37 AM7/6/21

to druid...@googlegroups.com

65k+ seems like it is reaching the limit on open files. What is your ulimit on open files?. Set the ulimit on open files to a higher value. Also once the historicals run a compaction task to get segments size to around 500MB

vijay

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/52e555e7-3092-48f2-8dd1-70eaaac7c375n%40googlegroups.com.

TechLifeWithMohsin

unread,

Jul 6, 2021, 9:31:12 AM7/6/21

to Druid User

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 30446
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65535
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited

file locks (-x) unlimited

when I saw the failed intance, buff/cache was 710 GB and thats the failure

vijay narayanan

unread,

Jul 6, 2021, 9:49:33 AM7/6/21

to druid...@googlegroups.com

I am not so sure…you are open files limit is at 65535. That means that when 65k segments are loaded this limit will be reached and the historical will go down. You want to make open files to unlimited and try

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/b3403a6a-523c-4d72-8613-1ac2b265509bn%40googlegroups.com.

Joseph Mocker

unread,

Jul 6, 2021, 10:53:52 AM7/6/21

to druid...@googlegroups.com

FWIW I don't think buff/cache is a useful metric to be looking at. Memory that is allocated there can generally be freed and used for other things when the system is under memory pressure.

https://unix.stackexchange.com/questions/390518/what-do-the-buff-cache-and-avail-mem-fields-in-top-mean

--joe

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/52e555e7-3092-48f2-8dd1-70eaaac7c375n%40googlegroups.com.

TechLifeWithMohsin

unread,

Jul 6, 2021, 2:24:40 PM7/6/21

to Druid User

After the research I am also thinking the same buffer cache should not be the cause, I've increased the nofile limit to 200k, And ingestion is running, will see how that goes.

TechLifeWithMohsin

unread,

Jul 7, 2021, 2:33:36 AM7/7/21

to Druid User

I am getting this error:-

OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f2f9ec20000, 65536, 1) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 65536 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /home/ec2-user/apache-druid-0.19.0/hs_err_pid46466.log
#
# Compiler replay data is saved as:
# /home/ec2-user/apache-druid-0.19.0/replay_pid46466.log

My jvm settings are:-

-server
-Xms300g
-Xmx300g
-Daws.region=us-east-1
-XX:MaxDirectMemorySize=300g
-Dcom.sun.xml.bind.v2.bytecode.ClassTailor.noOptimize=true
-XX:MaxMetaspaceSize=100g
-XX:+ExitOnOutOfMemoryError
-Duser.timezone=UTC
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=/mnt/disk2/var/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

vijay narayanan

unread,

Jul 7, 2021, 2:56:22 AM7/7/21

to druid...@googlegroups.com

this can happen if you have exceeded the max memory mapped files (look at https://druid.apache.org/docs/latest/operations/basic-cluster-tuning.html#system-configuration)

Increase the memory files to a larger number

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/3e7bf7e0-23e8-464a-b7ab-7e4ec3c3be0cn%40googlegroups.com.

TechLifeWithMohsin

unread,

Jul 7, 2021, 3:03:53 AM7/7/21

to Druid User

Thanks, yeah I got the same done from this post and https://stackoverflow.com/questions/48592602/native-memory-allocation-mmap-failed-to-map

And it seems to work now.

Reply all

Reply to author

Forward