关于tachyon与spark集成的错误

29 views
Skip to first unread message

salmon0...@gmail.com

unread,
Sep 23, 2015, 11:18:48 AM9/23/15
to Tachyon Users
使用spark 1.3.0, tachyon 0.5.0, 底层存储使用HDFS。只要加上rdd.persist(StorageLevel.OFF_HEAP), 报下面的错误, 去掉词条语句正常读写。
def main(args: Array[String]): Unit = {
    val sparkConf = new SparkConf()
    sparkConf.set("spark.tachyonStore.url", "tachyon://hadoop-manager:19998")
    sparkConf.set("spark.tachyonStore.baseDir", "/tmp_rdd")
    val sc = new SparkContext(sparkConf)
    sc.hadoopConfiguration.set("fs.tachyon.impl", "tachyon.hadoop.TFS")
    
    val rdd = sc.textFile("tachyon://hadoop-manager:19998/tmp1/")
    rdd.persist(StorageLevel.OFF_HEAP)
    rdd.saveAsTextFile("tachyon://hadoop-manager:19998/test_tach3/")
    sc.stop()
  }
错误如下:
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 4, hadoop-store2): java.io.IOException: BlockIndex 0 is out of the bound in file ClientFileInfo(id:29, name:rdd_1_1, path:/tmp_rdd/spark-c220e135-34cb-43ec-81fa-bac305ef4a46/2/spark-tachyon-20150923205718-fae3/16/rdd_1_1, ufsPath:, length:0, blockSizeByte:1073741824, creationTimeMs:1443013038744, isComplete:true, isFolder:false, isPinned:false, isCache:true, blockIds:[], dependencyId:-1, inMemoryPercentage:100)
at tachyon.client.TachyonFS.getClientBlockInfo(TachyonFS.java:785)
at tachyon.client.TachyonFile.getLocationHosts(TachyonFile.java:172)
at org.apache.spark.storage.TachyonStore.getBytes(TachyonStore.scala:105)
at org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:499)
at org.apache.spark.storage.BlockManager.getLocal(BlockManager.scala:431)
at org.apache.spark.storage.BlockManager.get(BlockManager.scala:617)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:44)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:242)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

苗海泉

unread,
Jul 31, 2016, 10:24:44 PM7/31/16
to Alluxio Users, tachyo...@googlegroups.com
spark1.6.2 配合alluxio 1.2 也是有问题的,我怀疑已经不支持了,spark2.0已经去掉默认嵌入的tachyon,alluxio的官方文档也没找到。

Pei Sun

unread,
Aug 1, 2016, 12:04:25 PM8/1/16
to 苗海泉, Alluxio Users, tachyo...@googlegroups.com
Hi,
    Yeah, Spark2.0 removed the built-in support for tachyon. We suggest you to use saveAsTextFile (or Parquet etc) to save stuff to Alluxio directly. 

Pei

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alluxio-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Pei Sun
Reply all
Reply to author
Forward
0 new messages