As per h2o, we should only have 4 times data size bigger cluster, but we took even 128GB 4 worker nodes with a 128 master node. But still its raising issues.
Please help us to choose the spark configuration needed to run h2o with our current data set. We are able to run the same code for 50,000 records.
we have 300 columns for X and 2 pairs of interaction terms. offset column and weights as well.
You can find the sample code here but it doesnt have 300 column. I don't know how I can give the perfect input file and full code to replicate the issue. Please let me know if you prefer to see the actual code with 300 columns.
`# Load the libraries used to analyze the data
library(survival)
library(MASS)
library(h2o)
# Create H2O-based model
predictors <- c("HasPartner", "HasSingleLine", "HasMultipleLines",
"HasPaperlessBilling", "HasAutomaticBilling",
"MonthlyCharges",
"HasOnlineSecurity", "HasOnlineBackup", "HasDeviceProtection",
"HasTechSupport", "HasStreamingTV", "HasStreamingMovies")
h2o_model <- h2o.coxph(x = predictors,
event_column = "HasChurned",
stop_column = "tenure",
stratify_by = "Contract",
training_frame = churn_hex)
print(summary(h2o_model))'
We tried mutiple conf's, some of them are below conf$spark.executor.memory <- "192g"
conf$spark.executor.cores <-5
conf$spark.executor.instances <- 9
conf$'sparklyr.shell.executor-memory' <- "32g"
conf$'sparklyr.shell.driver-memory' <- "32g"
conf$spark.yarn.am.memory <- "32g"
conf$spark.dynamicAllocation.enabled <- "false"
conf$spark.driver.memory="57.6g"
sc <- spark_connect(master = "yarn-client", version = "2.4.3",config = conf)
We have also tried with this Sys.setenv(SPARK_HOME="/usr/lib/spark")
conf <- spark_config()
conf$spark.executor.memory <- "44g"
conf$spark.executor.cores <-8
conf$spark.executor.instances <- 5
conf$spark.dynamicAllocation.enabled <- "false"
sc <- spark_connect(master = "yarn-client", version = "2.4.3",config = conf)
On Dec 2, 2019, at 6:15 AM, divya....@gmail.com wrote:
--
You received this message because you are subscribed to the Google Groups "H2O Open Source Scalable Machine Learning - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/h2ostream/a49fa53d-a533-4284-b18e-757404a940ac%40googlegroups.com.
-XX:PrintGCDetails
-XX:PrintGCTimeStamps--
You received this message because you are subscribed to the Google Groups "H2O Open Source Scalable Machine Learning - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/h2ostream/e60af94e-ee3b-475f-9897-a6170d402032%40googlegroups.com.