Support Request

Max Schloemer

unread,

Aug 19, 2015, 1:31:41 PM8/19/15

to H2O Open Source Scalable Machine Learning - h2ostream, ashwan...@fmr.com, Jeff Cramer

Support,

Please review and respond to all on this message from Ashwani Gupta with FMR.

Thanks,

Max

Begin Paste of Chat Message:

Hi, We are trying to built a logistic model on a data of 400000 rows and 24000 columns on cluster of 40 nodes and 40GB each.We get an error of-Gram matrices (one per thread) won't fit in the driver node's memory (327.54 GB > 38.33 GB) - try reducing the number of columns and/or the number of categorical factors (or switch to the L-BFGS solver). ' java.lang.IllegalArgumentException: ERROR on field: _train: Gram matrices (one per thread) won't fit in the driver node's memory (327.54 GB > 38.33 GB) - try reducing the number of columns and/or the number of categorical factors (or switch to the L-BFGS solver). at hex.Model$Output.<init>(Model.java:263) at hex.glm.GLMModel$GLMOutput.<init>(GLMModel.java:449) at hex.glm.GLMModel.<init>(GLMModel.java:27) at hex.glm.GLM.init(GLM.java:380) at hex.glm.GLM$GLMDriver.compute2(GLM.java:740) at water.H2O$H2OCountedCompleter.compute(H2O.java:953) at jsr166y.CountedCompleter.exec(CountedCompleter.java:429) at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974) at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) .

End Paste:

Max Schloemer

Customer Engagement Manager

H2O.ai

Save the date for H2O World 2015!

Message has been deleted

Avni Wadhwa

unread,

Aug 19, 2015, 7:08:11 PM8/19/15

to H2O Open Source Scalable Machine Learning - h2ostream, ashwan...@fmr.com, cra...@h2o.ai, m...@0xdata.com

Hi Ashwani,

There are two ways to go about answering your question.

The first way is setting lambda_search to true, setting the max_active_predictors to 500, and strong_rules = T.

An example code is:

date();train.glm <- h2o.glm(y="TAG",x=nm[2:length(nm)],data=train,family="binomial", nfolds=10, alpha=0.5, lambda_search=T, higher_accuracy=T, strong_rules=T, max_predictors=150); date()

You can run that and see if the results are good enough for you.

The second way would be to use the L_BFGS solver.

Could you please tell me how you got the 24,000 columns (did you expand the categorical columns into granular columns). Also could you tell me a little bit about your use case?

Avni

Reply all

Reply to author

Forward

Support Request - FMR

Max Schloemer

Avni Wadhwa