How to process a huge dataset with H2O?

322 views
Skip to first unread message

nguyentr...@gmail.com

unread,
Aug 31, 2017, 4:25:29 AM8/31/17
to H2O Open Source Scalable Machine Learning - h2ostream
I am trying to train a machine learning model with H2O (3.14). My dataset size is 4Gb and my computer RAM is 2Gb with 2G swap, JDK 1.8. Refer to this [article](https://blog.h2o.ai/2014/03/h2o-architecture/), H2O can process a huge dataset with 2Gb RAM.

* A note on Bigger Data and GC: We do a user-mode swap-to-disk when the Java heap gets too full, i.e., you're using more Big Data than
physical DRAM. We won't die with a GC death-spiral, but we will degrade to out-of-core speeds. We'll go as fast as the disk will
allow. I've personally tested loading a 12Gb dataset into a 2Gb (32bit) JVM; it took about 5 minutes to load the data, and another 5 minutes to run a Logistic Regression.

This link refer to a similar case of mine. https://stackoverflow.com/questions/34082792/loading-data-bigger-than-the-memory-size-in-h2o . The answer mentioned user-mode swap-to-disk was disabled since performance was so bad. However, he did not explain any alternative method and how can enable the flags `--cleaner` in h2o?


# Work around 1:
I configured the java heap with options `java -Xmx10g -jar h2o.jar`. When I load dataset. The H2O information as follows:


However, JVM consumed all RAM memory and Swap, then operating system halted java h2o program.

# Work around 2:
I installed [H2O spark](http://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.2/0/index.html). I can load dataset but spark was hanging with the following logs with a full swap memory:

+ FREE:426.8 MB == MEM_MAX:2.67 GB), desiredKV=841.3 MB OOM!
09-01 02:01:12.377 192.168.233.133:54321 6965 Thread-47 WARN: Swapping! OOM, (K/V:1.75 GB + POJO:513.2 MB + FREE:426.8 MB == MEM_MAX:2.67 GB), desiredKV=841.3 MB OOM!
09-01 02:01:12.377 192.168.233.133:54321 6965 Thread-48 WARN: Swapping! OOM, (K/V:1.75 GB + POJO:513.2 MB + FREE:426.8 MB == MEM_MAX:2.67 GB), desiredKV=841.3 MB OOM!
09-01 02:01:12.381 192.168.233.133:54321 6965 Thread-45 WARN: Swapping! OOM, (K/V:1.75 GB + POJO:513.3 MB + FREE:426.7 MB == MEM_MAX:2.67 GB), desiredKV=803.2 MB OOM!
09-01 02:01:12.382 192.168.233.133:54321 6965 Thread-46 WARN: Swapping! OOM, (K/V:1.75 GB + POJO:513.4 MB + FREE:426.5 MB == MEM_MAX:2.67 GB), desiredKV=840.9 MB OOM!
09-01 02:01:12.384 192.168.233.133:54321 6965 #e Thread WARN: Swapping! GC CALLBACK, (K/V:1.75 GB + POJO:513.4 MB + FREE:426.5 MB == MEM_MAX:2.67 GB), desiredKV=802.7 MB OOM!
09-01 02:01:12.867 192.168.233.133:54321 6965 FJ-3-1 WARN: Swapping! OOM, (K/V:1.75 GB + POJO:513.4 MB + FREE:426.5 MB == MEM_MAX:2.67 GB), desiredKV=1.03 GB OOM!
09-01 02:01:13.376 192.168.233.133:54321 6965 Thread-46 WARN: Swapping! OOM, (K/V:1.75 GB + POJO:513.2 MB + FREE:426.8 MB == MEM_MAX:2.67 GB), desiredKV=803.2 MB OOM!
09-01 02:01:13.934 192.168.233.133:54321 6965 Thread-45 WARN: Swapping! OOM, (K/V:1.75 GB + POJO:513.2 MB + FREE:426.8 MB == MEM_MAX:2.67 GB), desiredKV=841.3 MB OOM!
09-01 02:01:12.867 192.168.233.133:54321 6965 #e Thread WARN: Swapping! GC CALLBACK, (K/V:1.75 GB + POJO:513.2 MB + FREE:426.8 MB == MEM_MAX:2.67 GB), desiredKV=803.2 MB OOM!

In this case, I think the `gc` collector is waiting for cleaning some unused memory in swap.

How can I process huge dataset with a limited RAM memory ?

Darren Cook

unread,
Aug 31, 2017, 4:45:16 AM8/31/17
to h2os...@googlegroups.com
> My dataset size is 4Gb and my computer RAM is 2Gb ...

Answered on StackOverflow; please don't cross post.

Darren
Reply all
Reply to author
Forward
0 new messages