Can't not run spark job on yarn..

101 views
Skip to first unread message

Eric DONG

unread,
Jun 15, 2016, 11:12:22 AM6/15/16
to alluxi...@googlegroups.com
Environment:
hadoop 2.5.2, alluxio 1.0.1, spark 1.5.2, System centOS

I want to run spark job on yarn cluster, and put the relate alluxio jars on
$hadoop/share/hadoop/yarn/

Firstly, put a file to Alluxio:
$ bin/alluxio fs copyFromLocal LICENSE /LICENSE
Secondly, submit a spark job to yarn cluster, shows
error:java.lang.NoSuchFieldError: mProtocol.
Cannot read file on alluxio. And i tried to write a file by spark also
failed. But in Some machine i could run the spark job succecssfully and some
failed...
Then i use another method to submit these jar by spark method could success:
$spark-submit --class com.test --master yarn-cluster --num-executors 4
--executor-memory 1g --executor-cores 4 --jars
alluxio-core-client-1.0.1.jar,alluxio-examples-1.0.1.jar,alluxio-underfs-hdfs-1.0.1.jar,alluxio-core-client-internal-1.0.1.jar,alluxio-keyvalue-client-internal-1.0.1.jar,alluxio-underfs-local-1.0.1.jar,alluxio-core-common-1.0.1.jar,alluxio-keyvalue-common-1.0.1.jar,alluxio-underfs-s3-1.0.1.jar
/test.jar $LocalIP


My Spark Test Demo:
object test {

private val LOG = Logger.getLogger(this.getClass)

def main (args: Array[String]) {
LOG.error("======================================")
val ip = args(0)
val alluxioIp = s"alluxio://$ip:19998/LICENSE"

val conf = new SparkConf().setAppName("test")
val sc = new SparkContext(conf)
sc.hadoopConfiguration.set("fs.alluxio.impl",
"alluxio.hadoop.FileSystem")
//READ file on alluxio
val s = sc.textFile(alluxioIp)
LOG.error(s.first())

//WRITE a dateframe into alluxio
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
// Create a simple DataFrame, stored into a partition directory
val df1 = sc.makeRDD(1 to 5).map(i => (i, i * 2)).toDF("single",
"double")
df1.write.parquet(s"alluxio://$ip:19998/test")

LOG.error("SUCCESS!")

}
}


AND HERE is error message:
16/06/15 13:39:47 ERROR ApplicationMaster: User class threw exception:
org.apache.spark.sql.AnalysisException: path
alluxio://208.208.102.230:19998/test already exists.;
org.apache.spark.sql.AnalysisException: path
alluxio://208.208.102.230:19998/test already exists.;
at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:76)
at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
at
org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:69)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:140)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:138)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:138)
at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:933)
at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:933)
at
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:197)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:146)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:137)
at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:304)
at com.uniview.salut.spark.traffic.test$.main(test.scala:46)
at com.uniview.salut.spark.traffic.test.main(test.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:525)




--
View this message in context: http://alluxio-users.85194.x6.nabble.com/Can-t-not-run-spark-job-on-yarn-tp178.html
Sent from the Alluxio Users mailing list archive at Nabble.com.

Bin Fan

unread,
Jun 15, 2016, 12:50:32 PM6/15/16
to Eric DONG, Alluxio Users
Hi Eric,

A few caveats here:

(1) For spark jobs, how do you compile or download Alluxio?
when you compile Alluxio, using 
mvn clean package -Pspark 
to build the jars for Spark 

if you download from http://www.alluxio.org/download/, remember to pick the client jar one prebuilt for Spark .

(2) This is probably incorrect
" --jars
alluxio-core-client-1.0.1.jar,alluxio-examples-1.0.1.jar,alluxio-underfs-hdfs-1.0.1.jar,alluxio-core-client-internal-1.0.1.jar,alluxio-keyvalue-client-internal-1.0.1.jar,alluxio-underfs-local-1.0.1.jar,alluxio-core-common-1.0.1.jar,alluxio-keyvalue-common-1.0.1.jar,alluxio-underfs-s3-1.0.1.jar"
you should let Spark use the alluxio-core-client-1.0.1.jar"
You should use alluxio-core-client-1.0.1-jar-with-dependencies.jar
instead of the above list of jars.

(3) you may also try 1.1 instead of 1.0

More detailed tutorial about using Spark on Alluxio: http://www.alluxio.org/documentation/en/Running-Spark-on-Alluxio.html


--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alluxio-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages