关于tachyon与spark集成的错误

7 views
Skip to first unread message

salmon0...@gmail.com

unread,
Sep 23, 2015, 11:17:56 AM9/23/15
to Tachyon Users
使用spark 1.3.0, tachyon 0.5.0, 底层存储使用HDFS。只要加上rdd.persist(StorageLevel.OFF_HEAP), 报下面的错误, 去掉词条语句正常读写。
def main(args: Array[String]): Unit = {
    val sparkConf = new SparkConf()
    sparkConf.set("spark.tachyonStore.url", "tachyon://hadoop-manager:19998")
    sparkConf.set("spark.tachyonStore.baseDir", "/tmp_rdd")
    val sc = new SparkContext(sparkConf)
    sc.hadoopConfiguration.set("fs.tachyon.impl", "tachyon.hadoop.TFS")
    
    val rdd = sc.textFile("tachyon://hadoop-manager:19998/tmp1/")
    rdd.persist(StorageLevel.OFF_HEAP)
    rdd.saveAsTextFile("tachyon://hadoop-manager:19998/test_tach3/")
    sc.stop()
  }
错误如下:
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 4, hadoop-store2): java.io.IOException: BlockIndex 0 is out of the bound in file ClientFileInfo(id:29, name:rdd_1_1, path:/tmp_rdd/spark-c220e135-34cb-43ec-81fa-bac305ef4a46/2/spark-tachyon-20150923205718-fae3/16/rdd_1_1, ufsPath:, length:0, blockSizeByte:1073741824, creationTimeMs:1443013038744, isComplete:true, isFolder:false, isPinned:false, isCache:true, blockIds:[], dependencyId:-1, inMemoryPercentage:100)
at tachyon.client.TachyonFS.getClientBlockInfo(TachyonFS.java:785)
at tachyon.client.TachyonFile.getLocationHosts(TachyonFile.java:172)
at org.apache.spark.storage.TachyonStore.getBytes(TachyonStore.scala:105)
at org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:499)
at org.apache.spark.storage.BlockManager.getLocal(BlockManager.scala:431)
at org.apache.spark.storage.BlockManager.get(BlockManager.scala:617)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:44)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:242)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
Reply all
Reply to author
Forward
0 new messages