H2O anomaly per_feature = TRUE java.lang.OutOfMemoryError: Java heap space

144 views
Skip to first unread message

fwz...@gmail.com

unread,
Jul 18, 2020, 9:01:05 AM7/18/20
to H2O Open Source Scalable Machine Learning - h2ostream

I run H2O anomaly with per_feature = TRUE which results in a Java Heap Space error. In some other posts about this error message, I see people suggest using h2o.remove(df) to release the used memory. However, in my case I don’t have any loop, and it seems that there is nothing I can remove to release some used memory. 


Here is my code:


library(h2o)

h2o.init(min_mem_size = "10G", max_mem_size = "15G")


data.hex <- as.h2o(data)


x <- names(data.hex)


random_seed <- 42


# Deeplearning Model

print("Deep learning model begins ...")

model.dl = h2o.deeplearning(x = x, 

                              training_frame = data.hex, 

                              autoencoder = TRUE, 

                              activation = "Tanh",

                              hidden = c(5, 5, 5, 5, 5), 

                              mini_batch_size = 64,  

                              epochs = 100, 

                              stopping_rounds = 15,  

                              variable_importances = TRUE,

                              seed = random_seed) 


# Calculating anomaly per feature

print('Calculating anomaly per feature ...')

errors_per_feature <- h2o.anomaly(model.dl, data.hex, per_feature = TRUE) # Anomaly Detection Algorithm


print('Converting from H2O frame to dataframe ...')

errors1_per_feature <- as.data.frame(errors_per_feature) # Convert back to data frame


Here is the detailed error message:


[1] "Deep learning model begins ..."

  |======================================================================| 100%

[1] "Calculating anomaly per feature ..."


ERROR: Unexpected HTTP Status code: 500 Server Error (url = http://localhost:54321/3/Predictions/models/DeepLearning_model_R_1594826474037_2/frames/Accesses_sid_a71f_1)


water.util.DistributedException

 [1] "DistributedException from localhost/127.0.0.1:54321: 'Java heap space', caused by java.lang.OutOfMemoryError: Java heap space"

 [2] "    water.MRTask.getResult(MRTask.java:494)"                                                                                  

 [3] "    water.MRTask.getResult(MRTask.java:502)"                                                                                  

 [4] "    water.MRTask.doAll(MRTask.java:397)"                                                                                      

 [5] "    water.MRTask.doAll(MRTask.java:403)"                                                                                      

 [6] "    hex.deeplearning.DeepLearningModel.scoreAutoEncoder(DeepLearningModel.java:761)"                                          

 [7] "    water.api.ModelMetricsHandler.predict(ModelMetricsHandler.java:469)"                                                      

 [8] "    java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)"                                           

 [9] "    java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)"                         

[10] "    java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)"                 

[11] "    java.base/java.lang.reflect.Method.invoke(Method.java:567)"                                                               

[12] "    water.api.Handler.handle(Handler.java:60)"                                                                                

[13] "    water.api.RequestServer.serve(RequestServer.java:470)"                                                                    

[14] "    water.api.RequestServer.doGeneric(RequestServer.java:301)"                                                                

[15] "    water.api.RequestServer.doPost(RequestServer.java:227)"                                                                   

[16] "    javax.servlet.http.HttpServlet.service(HttpServlet.java:755)"                                                             

[17] "    javax.servlet.http.HttpServlet.service(HttpServlet.java:848)"                                                             

[18] "    org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)"                                                   

[19] "    org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501)"                                               

[20] "    org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)"                                       

[21] "    org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:427)"                                                

[22] "    org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)"                                        

[23] "    org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)"                                            

[24] "    org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)"                                    

[25] "    org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)"                                          

[26] "    water.webserver.jetty8.Jetty8ServerAdapter$LoginHandler.handle(Jetty8ServerAdapter.java:119)"                             

[27] "    org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)"                                    

[28] "    org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)"                                          

[29] "    org.eclipse.jetty.server.Server.handle(Server.java:370)"                                                                  

[30] "    org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)"                           

[31] "    org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)"                            

[32] "    org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:984)"                                 

[33] "    org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1045)"                 

[34] "    org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)"                                                         

[35] "    org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:236)"                                                    

[36] "    org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)"                                   

[37] "    org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)"                             

[38] "    org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)"                                         

[39] "    org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)"                                          

[40] "    java.base/java.lang.Thread.run(Thread.java:830)"                                                                          

[41] "Caused by:java.lang.OutOfMemoryError: Java heap space"                                                                        


Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = page,  : 

  


ERROR MESSAGE:


DistributedException from localhost/127.0.0.1:54321: 'Java heap space'


Calls: h2o.anomaly -> .h2o.__remoteSend -> .h2o.doSafeREST

Execution halted



R and H2O version:


    H2O cluster version:        3.30.0.6  

    H2O cluster total nodes:    1 

    H2O cluster total memory:   13.43 GB 

    H2O cluster total cores:    16 

    H2O cluster allowed cores:  16 

    H2O cluster healthy:        TRUE 

    R Version:                  R version 3.6.3 (2020-02-29)


I have 16 GB of memory on my macOS. 


There are 6 variables (columns) in data: 5 categorical variables and 1 numeric variable. The number of unique values of the 5 categorical variables is 17, 49, 52, 85 and 5032, respectively. The number of rows is ~500k. The data file size is 44 MB (before encoding within H2O).


What can I do in my case to resolve the issue? Please let me know if there is any other information I can provide. Thanks for your help!


Tom Kraljevic

unread,
Jul 18, 2020, 10:00:49 AM7/18/20
to fwz...@gmail.com, H2O Open Source Scalable Machine Learning - h2ostream

i suspect the large number of categorical levels is causing the memory to blow up.
try removing that variable and seeing if it at least completes.
if it does, try re-binning into a smaller number of levels somehow.

tom


fwz...@gmail.com

unread,
Jul 18, 2020, 10:15:05 AM7/18/20
to H2O Open Source Scalable Machine Learning - h2ostream
Thanks Tom for the suggestion!

The code was able to run after I removed the categorical variable that has 5032 unique values. I will think about how to aggregate it into a smaller number of levels.  
Reply all
Reply to author
Forward
0 new messages