Spark connection with Hive in ozzie workflow from Hue

26 views
Skip to first unread message

Nikos Mih

unread,
Dec 11, 2017, 5:37:59 AM12/11/17
to Hue-Users
I want to run a Spark SQL query to my Hive tables through Oozie from Hue

So I create a workflow where I run this pyspark script:

   from pyspark import SparkConf, SparkContext
   from pyspark.sql import SQLContext
   from pyspark.sql import HiveContext
   from pyspark.sql.functions import *

   sconf = SparkConf().setAppName("MySpark").set("spark.driver.memory", "1g").setMaster("local")
   sc = SparkContext(conf=sconf)

   print "\n\nSpark is "
   print sc.version

  sqlContext = HiveContext(sc)
  sqlContext.sql("show databases").show()


and with properties:

Spark Master: yarn
Mode: Client
App name: MySpark

When I run this the jon does not finsh and I get for stdout log:

...
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:82)
 at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3399)
 at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3418)
 at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3643)
 at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:231)
 at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:215)
 at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:338)
...

what am I missing? When I don't use spark sql everything is fine. Do I need to add hive-site.xml and if where should it be placed?
Do I need to pass any Spark argument?

Thanks in advance for any suggestion.

Rotem Gabay

unread,
Dec 11, 2017, 5:58:46 AM12/11/17
to Nikos Mih, Hue-Users
Hi Nikos, 
Add the hive-site.xml (locate it in the same folder as your other files) and add reference on the workflow:
<file>/path/hive-conf.xml</file>

Rotem 

בתאריך יום ב׳, 11 בדצמ׳ 2017 ב-12:38 מאת Nikos Mih <nmich...@gmail.com>:
--
You received this message because you are subscribed to the Google Groups "Hue-Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hue-user+u...@cloudera.org.

Nikos Mih

unread,
Dec 11, 2017, 7:17:56 AM12/11/17
to Hue-Users, nmich...@gmail.com
I tried this and I got the same error.

Also in your reply you mean hive-site.xml and not hive-conf.xml
(you say hive-site but in your xml you mention hive-conf) or should I
try this with hive-conf.xml ?

Romain Rigaux

unread,
Dec 11, 2017, 1:21:27 PM12/11/17
to Nikos Mih, Hue-Users
Create a 'lib' dir in the workspace of the workflow and add the 'hive-site.xml' there?

To unsubscribe from this group and stop receiving emails from it, send an email to hue-user+unsubscribe@cloudera.org.

Nikos Mih

unread,
Dec 12, 2017, 4:46:17 AM12/12/17
to Hue-Users, nmich...@gmail.com
I added both my script and hive-site.xml into lib subfolder in my directory.
My code has not changed.

My error from stdout log is now:


Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, Application application_1512750023956_0235 finished with failed status
org.apache.spark.SparkException: Application application_1512750023956_0235 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1025)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1072)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:730)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:178)
at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:90)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:81)
at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:57)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:235)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)


I work in cloudera cdh5.13 and hue 4, java is at 1.8.0_131 (Oracle Corporation) and scala is at version 2.10.5.
To unsubscribe from this group and stop receiving emails from it, send an email to hue-user+u...@cloudera.org.

Reply all
Reply to author
Forward
0 new messages