how to let the spark load the hadoop clusters config which exists

wingma...@gmail.com

unread,

Aug 21, 2013, 5:02:23 AM8/21/13

to spark...@googlegroups.com

when I get a RDD, I want to use the API write to HDFS.
the code:
RDD.saveAsHadoopFile(path,classOf[Text], classOf[IntWritable], classOf[TextOutputFormat[Text, IntWritable]], jobConf)
write to HDFS is success.
but, it the block size is 64MB, replication is 3 and use many number reduce task. it without my hadoop clusters config.
how to set and let spark use my hadooop config.

Jerry Shao

unread,

Aug 21, 2013, 5:49:08 AM8/21/13

to spark...@googlegroups.com

Hi, you can set HADOOP_CONF_DIR in spark-env.sh to let Spark add Hadoop conf dir to classpath.

Thanks

Jerry

在 2013年8月21日星期三UTC+8下午5时02分23秒，wingma...@gmail.com写道：

wingma...@gmail.com

unread,

Aug 21, 2013, 6:04:21 AM8/21/13

to spark...@googlegroups.com

Thanks for you help!
I've tried it.but no any effect.

在 2013年8月21日星期三UTC+8下午5时49分08秒，Jerry Shao写道：

wingma...@gmail.com

unread,

Aug 21, 2013, 6:27:47 AM8/21/13

to spark...@googlegroups.com

I try to copy the hadoop hdfs-site.xml, core-site.xml, mapred-site.xml to $SPARK_HOME/conf or let HADOOP_CONF_DIR set to SPARK_CLASSPATH, give the error:
Exception in thread "main" java.lang.RuntimeException: Error in configuring object
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at spark.rdd.HadoopRDD.createInputFormat(HadoopRDD.scala:61)
    at spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:51)
    at spark.RDD.partitions(RDD.scala:168)
    at spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:9)
    at spark.RDD.partitions(RDD.scala:168)
    at spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:9)
    at spark.RDD.partitions(RDD.scala:168)
    at spark.Partitioner$.defaultPartitioner(Partitioner.scala:36)
    at spark.PairRDDFunctions.reduceByKey(PairRDDFunctions.scala:267)
    at com.qvod.kuaiwan.stats.spark.PlayCount$.getUserGamePlayCountHDFS(PlayCount.scala:40)
    at com.qvod.kuaiwan.stats.spark.PlayCount$.statsPlayCount(PlayCount.scala:104)
    at com.qvod.kuaiwan.stats.spark.StartUp$.main(StartUp.scala:39)
    at com.qvod.kuaiwan.stats.spark.StartUp.main(StartUp.scala)

help me!

在 2013年8月21日星期三UTC+8下午6时04分21秒，wingma...@gmail.com写道：

wingma...@gmail.com

unread,

Aug 21, 2013, 6:35:19 AM8/21/13

to spark...@googlegroups.com

o ! I success!
let HADOOP_CONF_DIR set to SPARK_CLASSPATH will be OK!

Tanks Jerry!

在 2013年8月21日星期三UTC+8下午6时27分47秒，wingma...@gmail.com写道：

Reply all

Reply to author

Forward