Support Request - FMR

59 views
Skip to first unread message

Max Schloemer

unread,
Aug 19, 2015, 1:31:41 PM8/19/15
to H2O Open Source Scalable Machine Learning - h2ostream, ashwan...@fmr.com, Jeff Cramer

Support,


Please review and respond to all on this message from Ashwani Gupta with FMR.

 

Thanks,


Max

 

 

Begin Paste of Chat Message:

 

Hi, We are trying to built a logistic model on a data of 400000 rows and 24000 columns on cluster of 40 nodes and 40GB each.We get an error of-Gram matrices (one per thread) won't fit in the driver node's memory (327.54 GB > 38.33 GB) - try reducing the number of columns and/or the number of categorical factors (or switch to the L-BFGS solver). ' java.lang.IllegalArgumentException: ERROR on field: _train: Gram matrices (one per thread) won't fit in the driver node's memory (327.54 GB > 38.33 GB) - try reducing the number of columns and/or the number of categorical factors (or switch to the L-BFGS solver). at hex.Model$Output.<init>(Model.java:263) at hex.glm.GLMModel$GLMOutput.<init>(GLMModel.java:449) at hex.glm.GLMModel.<init>(GLMModel.java:27) at hex.glm.GLM.init(GLM.java:380) at hex.glm.GLM$GLMDriver.compute2(GLM.java:740) at water.H2O$H2OCountedCompleter.compute(H2O.java:953) at jsr166y.CountedCompleter.exec(CountedCompleter.java:429) at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974) at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) .

 

End Paste:

 

Max Schloemer

Customer Engagement Manager

H2O.ai

O:  619.467.7016

C:  619.850.0578

m...@h2o.ai

www.h2o.ai

 

Save the date for H2O World 2015!


 
       

 

Message has been deleted

Avni Wadhwa

unread,
Aug 19, 2015, 7:08:11 PM8/19/15
to H2O Open Source Scalable Machine Learning - h2ostream, ashwan...@fmr.com, cra...@h2o.ai, m...@0xdata.com
Hi Ashwani,

There are two ways to go about answering your question. 
The first way is setting lambda_search to true, setting the max_active_predictors to 500, and strong_rules = T. 
An example code is: 
date();train.glm <- h2o.glm(y="TAG",x=nm[2:length(nm)],data=train,family="binomial", nfolds=10, alpha=0.5, lambda_search=T, higher_accuracy=T, strong_rules=T, max_predictors=150); date() 

You can run that and see if the results are good enough for you.

The second way would be to use the L_BFGS solver. 

Could you please tell me how you got the 24,000 columns (did you expand the categorical columns into granular columns). Also could you tell me a little bit about your use case?

Avni 
Reply all
Reply to author
Forward
0 new messages