GC overhead limit exceeded

assism...@gmail.com

unread,

Aug 14, 2018, 7:33:12 PM8/14/18

to H2O Open Source Scalable Machine Learning - h2ostream

Hi,

I´m working with H2O Python scripts.

I'm trying to estimate uncertainty in a process that includes DRF predictions.

This is the process:

Read training/test data (CSV). In this dataset, there are 1000 Y columns.
Read prediction database (CSV).

i=0
repeat:
Select training and testing sub-datasets randomly
Generate the model using Y[i] column
Generate predictions using this model
Add predicted values to a results dataset
Sum predicted values to a Sum column in results dataset
if (i % 10) == 0:
Save results
results = None
i+=1
until i==1000

Saving results for i == 34, I´m getting the error:

h2o.exceptions.H2OServerError: HTTP 500 Server Error:
Server error water.util.DistributedException:
Error: DistributedException from /127.0.0.1:54321: 'GC overhead limit exceede'

How can I solve it?

Tks,

Mauro Assis

laurend

unread,

Aug 15, 2018, 1:04:56 PM8/15/18

to H2O Open Source Scalable Machine Learning - h2ostream

Hi Mauro,

It looks like you probably ran out of memory. Can you provide a few more details:

what is the size of your cluster?

what is the size of the datasets you are using?

what parameters are you setting to build your random forest (how many trees, how deep is each tree)?

do you have any high cardinality features?

what version of H2O are you using?

As a rule of thumb H2O recommends sizing your cluster to at least four time the size of your dataset. So you could try to allocate more memory and potentially decrease the size of each random forest (fewer trees that aren't as deep).

- lauren

Tom Kraljevic

unread,

Aug 15, 2018, 1:13:47 PM8/15/18

to laurend, H2O Open Source Scalable Machine Learning - h2ostream

also, you typically can't loop like that "forever" without clearing out some memory.

strategically call h2o.remove() and h2o.removeAll() to clear up space at the bottom of your loop.

tom

--
You received this message because you are subscribed to the Google Groups "H2O Open Source Scalable Machine Learning - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

assism...@gmail.com

unread,

Aug 15, 2018, 4:18:50 PM8/15/18

to H2O Open Source Scalable Machine Learning - h2ostream

My cluster has 32 Gb memory, 16 cores.

My training and prediction datasets sum 12Gb.

I tried to use h2o.remove_all() but the total allocated memory size doesn´t change after the instructions.

Do you think it works?

Darren Cook

unread,

Aug 15, 2018, 5:15:17 PM8/15/18

to h2os...@googlegroups.com

> I tried to use h2o.remove_all() but the total allocated memory size doesn´t change after the instructions.

You need to wait for garbage collection to happen, or force it.

But there does not seem to be a way to do that in Python API?
https://stackoverflow.com/a/45609539/841830

(If that answer is no longer valid, and there is now a gc command in
Python, can someone let me know and I will update it.)

Darren

laurend

unread,

Aug 15, 2018, 8:03:40 PM8/15/18

to H2O Open Source Scalable Machine Learning - h2ostream

hi mauro,

how are you checking that the total allocated memory size hasn't changed?

Are you using h2o.cluster_status or h2o.cluster_info? If you do an h2o.ls() before and after using h2o.remove_all() do you still see the same list of keys (h2o.remove_all() should remove all objects from H2O)?

(side note: your cluster size may be too small since it's not 4 times the size of your data)

- lauren

Tom Kraljevic

unread,

Aug 15, 2018, 8:28:57 PM8/15/18

to laurend, H2O Open Source Scalable Machine Learning - h2ostream

Here's a pointer to an older thread that talks about how I typically debug H2O memory problems from R...

https://groups.google.com/forum/#!msg/h2ostream/Dc6l4xzwkaU/n-w2p02mBwAJ

assism...@gmail.com

unread,

Aug 19, 2018, 4:20:36 PM8/19/18

to H2O Open Source Scalable Machine Learning - h2ostream

I´m checking the amount of free memory indicated by OS (Windows, in this case).

Yes, the cluster is not ideal but I will rent a new one in the cloud that will be.

I added garbage collector to my Python scripts, however, I don´t think it will help because it runs in the client script, not at the h2o server level.

I´ve to run 1000 models and I´m not sure how to do that without crash because of lack of memory.

assism...@gmail.com

unread,

Aug 21, 2018, 5:27:03 PM8/21/18

to H2O Open Source Scalable Machine Learning - h2ostream

I tested using garbage collector this way:

h2o.remove_all()
collected=gc.collect()

Using Windows Resource Monitor it keeps the same memory occupied than before I call those instructions.

Reply all

Reply to author

Forward