memory allocation for spark streaming job in dataproc cluster

577 views
Skip to first unread message

Nagin Narbag

unread,
Sep 2, 2020, 4:20:02 AM9/2/20
to Google Cloud Dataproc Discussions
hello team,
I am using dataproc for spark streaming job.My cluster configuration is like 1 master and 4 worker node.
master->2 vcores and 7.5 GB Memory
worker->8 vcores and 64 GB Memory per node 
I am submitting spark job in cluster mode and using 16 executors with 11000mb memory per executor Here cluster allowing me to use 192 GB(including overhead) of memory out of 256 GB .
So my question is where 64GB is getting used and If I want to make use of that 64GB memory how can I do that?yarn minimum allocation is 1024 mb and maximum is 49152 mb.


Thanks and regards,
Nagin Narbag.

karth...@google.com

unread,
Sep 2, 2020, 1:53:03 PM9/2/20
to Google Cloud Dataproc Discussions
The extra memory is used for other daemons such as the Node Manager, Datanode, Dataproc agent, other OS daemons, and the OS page cache. We're changing our memory utilization in Dataproc 2.0 (preview) to allocate 90-95% of a worker VM's memory to YARN, rather than ~80%. But there will still need to be some buffer room.

We are rolling out that change in the next few weeks, and then you'll be able to use Dataproc 2.0 with this better memory utilization. 

Re: executor memory: you can just rely on Dataproc's defaults here. We set executor memory so that executors fit neatly inside memory configured for YARN.
Reply all
Reply to author
Forward
0 new messages