Spark running q05 issue

190 views
Skip to first unread message

Jantz Tran

unread,
Mar 17, 2016, 3:00:19 PM3/17/16
to Big Data Benchmark for BigBench
I've setup the BigBench kit on a BigInsight 4.1 cluster (running Spark 1.5.1) and I've gotten all the queries to run under the Spark engine except for query 5. The query seem to just get stuck when it gets to Stage 3 of the spark-mllib part and does not progress any further at that point. Any ideas? 

Thanks,
-Jantz

=========================
q05 Step 2/3: logistic regression with spark-mllib with direct metastore access
=========================
spark-submit --class io.bigdatabenchmark.v1.queries.q05.LogisticRegression /home/biadmin/TPCx-BB_v1.0.1/engines/hive/queries/Resources/bigbench-ml-spark.jar --fromHiveMetastore true -i bigbenchORC100s.q05_spark_sql_run_query_0_temp -o /user/biadmin/benchmarks/bigbench/queryResults/q05_spark_sql_run_query_0_result// --type LBFGS --step-size 1 --iterations 20 --lambda 0 --numClasses 2 --convergenceTol 1e-5 --numCorrections 10 --saveClassificationResult false --saveMetaInfo true --verbose false
Run LogisticRegression with options: Map('csvInputDelimiter -> ,, 'fromHiveMetastore -> true, 'verbose -> false, 'saveMetaInfo -> true, 'lambda -> 0, 'numCorrections -> 10, 'stepsize -> 1, 'convergenceTol -> 1e-5, 'iter -> 20, 'output -> /user/biadmin/benchmarks/bigbench/queryResults/q05_spark_sql_run_query_0_result//, 'type -> LBFGS, 'saveClassificationResult -> false, 'input -> bigbenchORC100s.q05_spark_sql_run_query_0_temp, 'numClasses -> 2)
16/03/14 14:41:29 INFO slf4j.Slf4jLogger: Slf4jLogger started
16/03/14 14:41:29 INFO Remoting: Starting remoting
16/03/14 14:41:29 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark...@9.30.4.94:62546]
loading data from metastore table: "bigbenchORC100s.q05_spark_sql_run_query_0_temp" ...
16/03/14 14:41:31 INFO hive.metastore: Trying to connect to metastore with URI thrift://luwperf5.svl.ibm.com:9083
16/03/14 14:41:31 INFO hive.metastore: Connected to metastore.
16/03/14 14:41:31 INFO hive.metastore: Trying to connect to metastore with URI thrift://luwperf5.svl.ibm.com:9083
16/03/14 14:41:31 INFO hive.metastore: Connected to metastore.
^M[Stage 0:>                                                       (0 + 88) / 200]^M[Stage 0:>                                                       (1 + 99) / 200]^M[Stage 0:=========================================>            (152 + 48) / 200]^M[Stage 0:==============================================>       (174 + 26) / 200]^M                                                                                ^Maverage: [188.6796294836899]
Training Model
^M[Stage 3:>                                                        (1 + 0) / 200]

Bhaskar Gowda

unread,
Mar 17, 2016, 3:38:53 PM3/17/16
to Big Data Benchmark for BigBench
Jantz -

Under \engines\hive\queries\q05\engineLocalSettings.conf  can you try setting resource settings for ML part export BIG_BENCH_ENGINE_HIVE_ML_FRAMEWORK_SPARK_BINARY= spark-submit --deploy-mode cluster --master yarn --executor-memory "%g" --executor-cores "%" --num-executors "%"

Jantz Tran

unread,
Mar 21, 2016, 4:23:47 PM3/21/16
to Big Data Benchmark for BigBench

I set

export BIG_BENCH_ENGINE_HIVE_ML_FRAMEWORK_SPARK_BINARY=spark-submit --deploy-mode cluster --master yarn --executor-memory 1g --executor-cores 1 --num-executors 1

in engineLocalSettings.conf as a test but get errors:

Additional local hive settings found. Adding /home/biadmin/TPCx-BB_v1.0.1/engines/hive/queries/q05/engineLocalSettings.sql to hive init.
/home/biadmin/TPCx-BB_v1.0.1/engines/hive/queries/q05/engineLocalSettings.conf: line 31: export: `--deploy-mode': not a valid identifier
/home/biadmin/TPCx-BB_v1.0.1/engines/hive/queries/q05/engineLocalSettings.conf: line 31: export: `--master': not a valid identifier
/home/biadmin/TPCx-BB_v1.0.1/engines/hive/queries/q05/engineLocalSettings.conf: line 31: export: `--executor-memory': not a valid identifier
/home/biadmin/TPCx-BB_v1.0.1/engines/hive/queries/q05/engineLocalSettings.conf: line 31: export: `1g': not a valid identifier
/home/biadmin/TPCx-BB_v1.0.1/engines/hive/queries/q05/engineLocalSettings.conf: line 31: export: `--executor-cores': not a valid identifier
/home/biadmin/TPCx-BB_v1.0.1/engines/hive/queries/q05/engineLocalSettings.conf: line 31: export: `1': not a valid identifier
/home/biadmin/TPCx-BB_v1.0.1/engines/hive/queries/q05/engineLocalSettings.conf: line 31: export: `--num-executors': not a valid identifier
/home/biadmin/TPCx-BB_v1.0.1/engines/hive/queries/q05/engineLocalSettings.conf: line 31: export: `1': not a valid identifier

Yan Tang

unread,
Mar 21, 2016, 10:19:38 PM3/21/16
to Big Data Benchmark for BigBench
hi, Jantz
you can modify the engines/hive/queries/q05/engineLocalSettings.conf based on the engines/spark/conf/engineSettings.conf, which is the global spark settings.
e.g. engines/hive/queries/q05/engineLocalSettings.conf  
      BINARY_PARAMS=(-v --driver-memory 4g --executor-memory 52g --executor-cores 9 --master yarn-client)

Best Regards,
Yan

在 2016年3月22日星期二 UTC+8上午4:23:47,Jantz Tran写道:

Jantz Tran

unread,
Mar 22, 2016, 2:40:38 PM3/22/16
to Big Data Benchmark for BigBench
I've tried different settings for the different resource params and it always hangs at Stage 3.

Yan Tang

unread,
Mar 22, 2016, 10:47:24 PM3/22/16
to Big Data Benchmark for BigBench
hi,
Since it hangs in Stage 3 of Spark ML, Could you please open your ApplicationMaster UI to check the resource usage and the executors' number in real-time?

Best Regards,
Yan

在 2016年3月23日星期三 UTC+8上午2:40:38,Jantz Tran写道:

Colin Cunningham

unread,
Feb 24, 2017, 2:26:24 PM2/24/17
to Big Data Benchmark for BigBench
Yantz.  I'm facing the same issue today.  I'm on HDP2.5 w/ Spark 1.6.2.

Did you ever get a resolution that you can share?  Thanks. - Colin
Reply all
Reply to author
Forward
0 new messages