Sparkling Water: wrong number of total cores reported

Christos

unread,

May 19, 2016, 10:14:09 AM5/19/16

to H2O Open Source Scalable Machine Learning - h2ostream

Hi all,

I am running Sparkling Water 1.6.3 over YARN in CDH 5.7, and despite starting sparkling-shell with a defined number of executors, cores, and memory, Sparkling Water in the Flow web UI reports erroneously the total number of cores available.

Details:

$ ./sparkling-shell --num-executors 4 --executor-cores 4 --executor-memory 4g
[...]
-----
Spark master (MASTER)     : yarn-client
Spark home   (SPARK_HOME) : /home/ctsats/spark-1.6.1-bin-hadoop2.6
H2O build version         : 3.8.2.3 (turchin)
Spark build version       : 1.6.1
----
[...]
scala> import org.apache.spark.h2o._
scala> val h2oContext = H2OContext.getOrCreate(sc)
[...]
Sparkling Water Context:
* H2O name: sparkling-water-ctsats_-856008650
* number of executors: 4
* list of used executors:
(executorId, host, port)
------------------------
(2,host-hd-04.corp.nodalpoint.com,54321)
(1,host-hd-05.corp.nodalpoint.com,54323)
(4,host-hd-05.corp.nodalpoint.com,54321)
(3,host-hd-03.corp.nodalpoint.com,54321)
------------------------

Open H2O Flow in browser: http://192.168.1.100:54321 (CMD + click in Mac OSX)

The getCloud report from H2O Flow (see first screenshot below), erroneously reports the total no of cores to be 96, instead of the expected 16 (4 cores X 4 executors); moreover, the 24 cores reported for host 192.168.1.5 is the total number of cores available in this host (in every one of the 3 hosts, actually) - in other words, the total no of cores for all my 3 worker nodes is only 24 X 3 = 72, far from 96 (see second screenshot below from Cloudera Manager, showing the no of cores available in each host). H2O Flow seems to think that it has 24 + 24 = 48 cores available in a host (192.168.1.5) with only 24 existing cores.


















The message from the R client is similar, informing that I have a maximum number of allowed cores equal to 96:

> h2oClient = h2o.init(ip="192.10.10.80", port=54321, strict_version_check = FALSE)  # Ethernet IP here
 Connection successful!

R is connected to the H2O cluster (in client mode): 
    H2O cluster uptime:         12 minutes 53 seconds 
    H2O cluster version:        3.8.2.3 
    H2O cluster name:           sparkling-water-ctsats_-856008650 
    H2O cluster total nodes:    4 
    H2O cluster total memory:   15.33 GB 
    H2O cluster total cores:    96 
    H2O cluster allowed cores:  96 
    H2O cluster healthy:        TRUE 
    H2O Connection ip:          192.10.10.80 
    H2O Connection port:        54321 
    H2O Connection proxy:       NA 
    R Version:                  R version 3.2.5 (2016-04-14)

Spark, in its own UI, correctly reports the number of available executors & cores (see 3rd screenshot below).

Any ideas?

Many thanks in advance.

mat...@0xdata.com

unread,

May 19, 2016, 11:11:40 AM5/19/16

to H2O Open Source Scalable Machine Learning - h2ostream

Hey Christos!

Thanks for all the details!

--executor-cores sets the number of cores but only for Spark executors.

When running Sparkling Water you are also running an H2O cluster on top of it but --executor-cores has nothing to do with it. What you see in the flow UI is the H2O cloud information not Spark. If you don't specify the number each H2O node will use all the core available.

You can see here how to change that setting in H2O http://www.h2o.ai/product/faq/#H2OLimitCPU

In sparkling water you can set it by passing this property "spark.ext.h2o.nthreads" to Spark.

Regards,
Mateusz

> The message from the R client is similar, informing that I have a maximum number of allowed cores equal to 96:
>
> > h2oClient = h2o.init(ip="192.10.10.80", port=54321, strict_version_check = FALSE) # Ethernet IP here
> Connection successful!
>
> R is connected to the H2O cluster (in client mode):
> H2O cluster uptime: 12 minutes 53 seconds
> H2O cluster version: 3.8.2.3
> H2O cluster name: sparkling-water-ctsats_-856008650
> H2O cluster total nodes: 4
> H2O cluster total memory: 15.33 GB
> H2O cluster total cores: 96
> H2O cluster allowed cores: 96
> H2O cluster healthy: TRUE
> H2O Connection ip: 192.10.10.80
> H2O Connection port: 54321
> H2O Connection proxy: NA
> R Version: R version 3.2.5 (2016-04-14)
>
> Spark, in its own UI, correctly reports the number of available executors & cores (see 3rd screenshot below).
>
>
>
>
>
>
>
>
>

Christos

unread,

May 19, 2016, 12:55:34 PM5/19/16

to H2O Open Source Scalable Machine Learning - h2ostream, mat...@0xdata.com

Hi Mateusz,

Thanks for the fast response & the info provided.

Firstly, the behavior you describe does not account for the 48 (i.e. 24 + 24) cores reported as "available" in host 192.168.1.5: the host has a total no. of only 24 cores; however, H2O, once it has established 2 executors in the subject host, it then goes on and assumes that all the existing cores are available for each one of its 2 executors, hence 2 x 24 = 48 for host 192.168.1.5 (which has only 24!). Clearly, this seems wrong.

In other words, the reported number 96 is reached as:
- 24 cores in host 192.168.1.3 (with 24 cores totally available)
- 24 cores in host 192.168.1.4 (with 24 cores totally available)
- 48 cores in host 192.168.1.5 (with only 24 cores totally available)

To stress the error, here is the result if I ask for 12 executors, i.e:

$ ./sparkling-shell --num-executors 12 --executor-cores 2 --executor-memory 2g

Well, it looks like magic - I have just created a H2O cloud of 288 cores, while I actually have only 72 cores in my cluster...!!!

Please, have a deeper and closer look at this issue...

Secondly, and at a more general level, I'll confess that your answer sounds puzzling: I had the impression that, exactly as you write, the H2O cluster runs "on top" of the provided Spark workers, i.e. that there is a "limitation" in the H2O cloud inherited from the underlying Spark resources (i.e. number of executors, memory, and cores) allocated. Now you say that, at least regarding the no. of cores (threads), the H2O cloud is not confined upon whatever resources have been allocated to it via the sparking-shell arguments. This sounds strange, indeed, and it begs new questions regarding the actual resources that can be accessed by the H2O workers and how exactly these can be accessed (since the underlying Spark workers know nothing about them). Can I count on that one H2O worker will use all, say 24, existing threads in a host machine, despite the fact that the respective Spark worker is confined to 4 threads? And if yes, is it really a good idea? i.e. can my H2O application exploit, consuming whatever resources sees "available" in its cloud???

At least regarding the memory allocation, we can see from the screenshots that I have uploaded that this is not the case, and the H2O cloud indeed sees only whatever amount of memory has been allocated to it via the --executor-memory argument of sparkling-shell. Now, why this should not be the case with the --executor-cores argument, too, remains an open question...

Closing, let me repeat & clarify that my initial question & remarks have been made in a very specific context: Sparkling Water over YARN - no H2O stand-alone cluster or whatever else...

Looking really forward to hear your thoughts on these issues

Many thanks

Christos

mat...@0xdata.com

unread,

May 19, 2016, 1:27:58 PM5/19/16

to H2O Open Source Scalable Machine Learning - h2ostream, mat...@0xdata.com

Hey Cristos,

1) even though your machine has only 24 cores, each H2O node is autonomous and tries to use as many cores as you defined (or all of them by default). Spark works in the same way, you can start XX executors with a lot of cores and each executor will try to use them, just launched 6 executors with 4 cores each on my local macbook :-)

2) as I mentioned you can limit the number of cores used by H2O nodes with the nthreads parameter. You might want to have a different number of CPUs available for your Spark executors and for your H2O nodes depending on your use case hence 2 parameters.

3) as for the memory, yes in that case it will work - H2O nodes run in the same JVM as Spark executors, if you limit the executor's max memory then we cannot use more (unless we'd launch H2O in a new JVM).

Hope this clears everything a bit.

Regards,
Mateusz

On Friday, May 20, 2016 at 1:55:34 AM UTC+9, Christos wrote:
> Hi Mateusz,
>
>
> Thanks for the fast response & the info provided.
>
>
> Firstly,
> the behavior you describe does not account for the 48 (i.e. 24 + 24)
> cores reported as "available" in host 192.168.1.5: the host has a total
> no. of only 24 cores; however, H2O, once it has established 2 executors
> in the subject host, it then goes on and assumes that all the existing
> cores are available for each one of its 2 executors, hence 2 x 24 = 48
> for host 192.168.1.5 (which has only 24!). Clearly, this seems wrong.
>
>
> In other words, the reported number 96 is reached as:
> - 24 cores in host 192.168.1.3 (with 24 cores totally available)
> - 24 cores in host 192.168.1.4 (with 24 cores totally available)
> - 48 cores in host 192.168.1.5 (with only 24 cores totally available)
>
>
> To stress the error, here is the result if I ask for 12 executors, i.e:
>
>
> $ ./sparkling-shell --num-executors 12 --executor-cores 2 --executor-memory 2g
>
>
>
>
>
>
>
>

Tom Kraljevic

unread,

May 19, 2016, 1:36:05 PM5/19/16

to mat...@0xdata.com, H2O Open Source Scalable Machine Learning - h2ostream

hi, i agree the 96 number is a bug.
it is double-counting one host.
the correct number is 24*3 == 72.

tom

> --
> You received this message because you are subscribed to the Google Groups "H2O Open Source Scalable Machine Learning - h2ostream" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Message has been deleted

Christos

unread,

May 19, 2016, 1:51:06 PM5/19/16

to H2O Open Source Scalable Machine Learning - h2ostream, mat...@0xdata.com

Thanx Tom

To be precise, it is X-counting each host, where X is the no. of executors running on the particular host (this is clearer in the screenshot I have uploaded in my reply, with 12 executors).

And the same holds for the disk space, too.

Christos

Message has been deleted

mat...@0xdata.com

unread,

May 19, 2016, 2:01:52 PM5/19/16

to H2O Open Source Scalable Machine Learning - h2ostream, mat...@0xdata.com

Hey,
Yes if you set X as number of executors and there are Y nodes, when X > Y then Spark will run more than 1 executor per node.

I can see where the confusion starts -> "cores" is just the maximum number of parallelism we will launch in each node, it's not the actual, physical number of cores in the cloud.

This behaviour is the same as with Spark, try running more executors than you have nodes with a lot of cores per node.

Here's a screenshot where I started 6 workers locally with 16 cpu's each where Spark thinks I have 96 cores:

http://oi64.tinypic.com/14o3u54.jpg

Regards,
Mateusz

Christos

unread,

May 20, 2016, 6:54:35 AM5/20/16

to H2O Open Source Scalable Machine Learning - h2ostream, mat...@0xdata.com

Hey Mateusz,

Thanks for your reply and your efforts. Nevertheless, I have tried to make clear from the beginning of the discussion that I am talking in a very specific (although not at all uncommon) context: Sparkling Water over YARN; arguably, this is the most expected context in a production environment.

In YARN country, there are neither a Spark Master nor Spark Workers - everything is a YARN container. So, let's see what resources are reported by YARN in the case I have already reported above, i.e 12 executors with 2 cores & 2G each:

$ ./sparkling-shell --num-executors 12 --executor-cores 2 --executor-memory 2g

Recall that, in this case, H2O Flow reports 288 "cores" (see my last screenshot above). Here is what is reported by YARN:

i.e. 13 containers (1x12 executors + 1 for the application master) and 25 cores (2x12 executors + 1 for the application master). These are your allocated resources - period.

Claiming that, regarding the cores, H2O can go beyond that number, despite the fact that, as you say, it runs in the same JVM with the Spark executors is interesting, but I really cannot see how it can be the case.

To wrap-up: however one defines "cores" (even in your case as "the maximum number of parallelism we will launch in each node"), the information reported by H2O (when Sparkling Water runs over YARN) is arguably neither useful nor even meaningful. I would strongly suggest that you have an internal discussion on the subject.

Many thanks again for your time & efforts

Christos

Tom Kraljevic

unread,

May 20, 2016, 12:09:34 PM5/20/16

to Christos, H2O Open Source Scalable Machine Learning - h2ostream, mat...@0xdata.com

hi christos,

one needs to be careful with respect to how yarn treats vcores.

and one needs to be careful about what is the ‘single source of truth’ for configuring/managing/reporting resources.

i would argue that the thing that does the enforcement is the single source of truth.

in the case of memory, i agree that this is yarn. a container is assigned a memory cap, and if the physical memory used by the process exceeds this value, yarn itself will terminate the container. (very aggressively, i might add.)

assuming no jobs are started on the system under the table, that memory is not available to other jobs. yarn managed it.

for cpu, it’s quite a different story. the linux kernel is the single source of truth. (if a yarn container is assigned 1 vcore but really uses 24, yarn doesn’t enforce that limit of 1 or terminate the container.)

you can see the kernel’s point of view if you look at /proc/PID/status at the bit array for ‘Cpus_allowed’.

H2O reads this value from the /proc filesystem in this file:

https://github.com/h2oai/h2o-3/blob/master/h2o-core/src/main/java/water/util/LinuxProcFileReader.java

in most actual yarn configurations, there is nothing connecting the yarn concept of vcores (which didn’t even exist in early versions of yarn) with the real linux thing of Cpus_allowed.

so the yarn vcores are used very coarsely by the resource manager scheduler to place jobs, but the assumption is that vcore will be 100% busy by the assigned container all the time, and not shared with any other job.

the problem with this assumption is that kernel CPU schedulers have been extensively designed over many years to seamlessly share CPUs across jobs, and many jobs have periods of heavier and lighter CPU usage.

what people usually want to do is to say “please limit this job’s CPU to no more that N cores, but share them if this job is idle”. yarn can do this if cgroups are enabled (and this then propagates to Cpus_allowed). but in practice, almost nobody does that. the result is the vcores are “advisory” and not enforced when they run.

systems like spark, with it’s executor-cores, or H2O with it’s nthreads, try to help with this problem by sizing their thread pool to at least behave well and follow the honor system as best as possible.

but unless you enabled cgroups, yarn’s vcores don’t do anything after the container is placed.

one could argue that in the sparkling water context, the “executor-threads” setting should propagate automatically to the h2o concept of nthreads. behavior-wise, i think that would have accomplished what you expected. that’s a good question and worth a discussion. (i’m not sure why it’s not like that now…)

(note this is all unrelated to not double-counting the reporting number when multiple containers are started on the same host. to get that really accurate, i think you’d need to do an OR-ing of the Cpus_allowed bitset for each host and then count up all the bits for all hosts. one has to be careful with this too, because the bitset can change at runtime, because linux kernel CPU scheduler parameters are dynamic.)

thanks,

tom

mat...@0xdata.com

unread,

May 22, 2016, 12:29:34 AM5/22/16

to H2O Open Source Scalable Machine Learning - h2ostream, cts...@hotmail.com, mat...@0xdata.com

Hey Christos!

As Tom mentioned by default we are taking the number of cores from the underlying machine (as defined in the OS). Don't worry, we're still running inside the container so you should be ok :-)

The problem is -> we might spin up way too many threads for a container. I will talk with the others about changing the default value for H2O's nthread when run on YARN. Maybe we can set it to 1 like Spark does?

As for setting nthreads to the same value as num-executors, this might be even more confusing to the user since we are not sharing the same threadpool with Spark. In such case, should you have 8 cores, set num-executors to 4 both spark and h2o would say they are using 4 cores but if you checked the actual CPU usage all 8 cores might be in use (4 for Sparks threadpools and 4 for h2o). That's why I'd simply leave both --num-executors and --spark.ext.h2o.nthreads and ask the user to set them as this will be varying on a case by case scenario.

Also separating those two values has another benefit - you can assign a small number of cores to Spark (i.e. you want to only do some simple ETL with Spark) and assign more to only H2O (i.e. because you want to do some CPU heavy computations).

Also I ask other devs what do they think about renaming "cores" to "threads" in the UI as it is really confusing.

Mateusz

Christos

unread,

May 23, 2016, 1:32:17 PM5/23/16

to H2O Open Source Scalable Machine Learning - h2ostream, cts...@hotmail.com, mat...@0xdata.com

Dear Tom & Mateusz,

Sincere thanks for your elaborate replies!

I think it has been a constructive & productive thread, and we all now have food for thought for our next steps.

Thanks again for your time & patience

Christos

Reply all

Reply to author

Forward