Hi everyone,
I'm new to tachyon (so sorry for the newbie question).
I am using spark 1.5.0 and tachyon 0.7.1 on a cluster of 17 nodes, and HDFS version is 2.6. Both spark master and tachyon master are in the same machine. I'm able to read the files from HDFS through Tachyon in spark.
val rdd = sc.textFile("tachyon://<<ip>>:19998/<<hdfs path>>")
This reads the RDD successfully. But when I'm trying to save something as an object file,
sc.saveAsObjectFile("tachyon://<<ip>>:19998/<<tachyon path>>")
It throws the following error.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 1.0 failed 4 times, most recent failure: Lost task 7.3 in stage 1.0 (TID 14, 10.10.5.23): java.io.IOException: FailedToCheckpointException(message:Failed to rename hdfs://10.10.3.16:8020/workers/1450942000003/77/139 to hdfs://10.10.3.16:8020/data/139)
at tachyon.worker.WorkerClient.addCheckpoint(WorkerClient.java:130)
at tachyon.client.TachyonFS.addCheckpoint(TachyonFS.java:228)
at tachyon.client.FileOutStream.close(FileOutStream.java:105)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1280)
at org.apache.hadoop.mapred.SequenceFileOutputFormat$1.close(SequenceFileOutputFormat.java:79)
at org.apache.spark.SparkHadoopWriter.close(SparkHadoopWriter.scala:103)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7.apply$mcV$sp(PairRDDFunctions.scala:1117)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1215)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1116)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: FailedToCheckpointException(message:Failed to rename hdfs://10.10.3.16:8020/workers/1450942000003/77/139 to hdfs://10.10.3.16:8020/data/139)
at tachyon.thrift.WorkerService$addCheckpoint_result$addCheckpoint_resultStandardScheme.read(WorkerService.java:3509)
at tachyon.thrift.WorkerService$addCheckpoint_result$addCheckpoint_resultStandardScheme.read(WorkerService.java:3477)
at tachyon.thrift.WorkerService$addCheckpoint_result.read(WorkerService.java:3403)
at tachyon.org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at tachyon.thrift.WorkerService$Client.recv_addCheckpoint(WorkerService.java:221)
at tachyon.thrift.WorkerService$Client.addCheckpoint(WorkerService.java:207)
at tachyon.worker.WorkerClient.addCheckpoint(WorkerClient.java:124)
... 17 more
I don't think permissions are an issue because I've changed to permissions for both data and worker folders using chmod 777.
And the tachyon was built by specifying -Dhadoop.version as 2.6.
Also, I formatted tachyon again, but it did not change anything.
Any help would be much appreciated!
PS: I saw an ticket raised about the same issue but it said it is fixed in version 0.9? Is there any way to fix it in this version?
Thank you.