Exceptions: The machine does not have any local worker (when running Spark on top of Tachyon-0.6.4)

Tianyin Xu

unread,

May 22, 2015, 6:50:47 PM5/22/15

to tachyo...@googlegroups.com

Hi all,

I'm new to Tachyon (so sorry for the newbie question). I'm trying to run Spark on top of Tachyon backed with HDFS, and I follow the instructions on the official sites.

=====VERSIONS====

Tachyon: 0.6.4

Hadoop: 1.0.4

Spark: Master branch (which uses tachyon-client-0.6.4 already)

==================

I have setup the Tachyon cluster on a 3-node cluster with HDFS and successfully the following tests:

$ ./bin/tachyon runTests

$ ./bin/tachyon tfs copyFromLocal ./README.md /

Then I compiled Spark (successfully), and run the following commands,

$ ./spark-shell
$ val s = sc.textFile("tachyon://xxx.xxx.xxx.xxx:19998/README.md")
$ s.count()
$ s.saveAsTextFile("tachyon://xxx.xxx.xxx.xxx:19998/README.md")

When I execute the 3rd command --s.count()--, I got the following exception:

15/05/22 15:28:26 INFO SparkContext: Starting job: count at <console>:24

15/05/22 15:28:26 INFO DAGScheduler: Got job 1 (count at <console>:24) with 2 output partitions (allowLocal=false)

15/05/22 15:28:26 INFO DAGScheduler: Final stage: ResultStage 1(count at <console>:24)

15/05/22 15:28:26 INFO DAGScheduler: Parents of final stage: List()

15/05/22 15:28:26 INFO DAGScheduler: Missing parents: List()

15/05/22 15:28:26 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[1] at textFile at <console>:21), which has no missing parents

15/05/22 15:28:26 INFO MemoryStore: ensureFreeSpace(3000) called with curMem=121375, maxMem=278302556

15/05/22 15:28:26 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.9 KB, free 265.3 MB)

15/05/22 15:28:26 INFO MemoryStore: ensureFreeSpace(1792) called with curMem=124375, maxMem=278302556

15/05/22 15:28:26 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1792.0 B, free 265.3 MB)

15/05/22 15:28:26 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:37746 (size: 1792.0 B, free: 265.4 MB)

15/05/22 15:28:26 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:874

15/05/22 15:28:26 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (MapPartitionsRDD[1] at textFile at <console>:21)

15/05/22 15:28:26 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks

15/05/22 15:28:26 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, localhost, ANY, 1421 bytes)

15/05/22 15:28:26 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, localhost, ANY, 1421 bytes)

15/05/22 15:28:26 INFO Executor: Running task 0.0 in stage 1.0 (TID 2)

15/05/22 15:28:26 INFO Executor: Running task 1.0 in stage 1.0 (TID 3)

15/05/22 15:28:26 INFO HadoopRDD: Input split: tachyon://xxx.xxx.xxx.xxx:19998/README.md:0+614

15/05/22 15:28:26 INFO HadoopRDD: Input split: tachyon://xxx.xxx.xxx.xxx:19998/README.md:614+614

15/05/22 15:28:26 INFO : open(tachyon://xxx.xxx.xxx.xxx:19998/README.md, 65536)

15/05/22 15:28:26 WARN : Recache attempt failed.

java.io.IOException: The machine does not have any local worker.

at tachyon.client.BlockOutStream.<init>(BlockOutStream.java:94)

at tachyon.client.BlockOutStream.<init>(BlockOutStream.java:65)

at tachyon.client.RemoteBlockInStream.read(RemoteBlockInStream.java:204)

at tachyon.hadoop.HdfsFileInputStream.read(HdfsFileInputStream.java:142)

at java.io.DataInputStream.read(DataInputStream.java:100)

at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:211)

at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)

at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:206)

at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:45)

at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:248)

at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:216)

at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)

at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)

at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)

at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1618)

at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1095)

at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)

at org.apache.spark.scheduler.Task.run(Task.scala:70)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

15/05/22 15:28:26 INFO : Try to find remote worker and read block 209379655680 from 0, with len 1228

15/05/22 15:28:26 INFO : Block locations:[NetAddress(mHost:xxx.xxx.xxx.xxx, mPort:29998, mSecondaryPort:29999)]

15/05/22 15:28:26 INFO : Try to find remote worker and read block 209379655680 from 614, with len 614

15/05/22 15:28:26 INFO : Block locations:[NetAddress(mHost:xxx.xxx.xxx.xxx, mPort:29998, mSecondaryPort:29999)]

15/05/22 15:28:26 INFO : xxx.xxx.xxx.xxx:29999 current host is tianyin.ucsd.edu 132.239.17.127

15/05/22 15:28:26 INFO : Connected to remote machine /xxx.xxx.xxx.xxx:29999 sent

15/05/22 15:28:26 INFO : Data 209379655680 to remote machine /xxx.xxx.xxx.xxx:29999 sent

15/05/22 15:28:26 INFO : data java.nio.HeapByteBuffer[pos=0 lim=614 cap=614], blockId:209379655680 offset:614 dataLength:614

15/05/22 15:28:26 INFO : data java.nio.HeapByteBuffer[pos=0 lim=1228 cap=1228], blockId:209379655680 offset:0 dataLength:1228

15/05/22 15:28:26 INFO : Data 209379655680 from remote machine /xxx.xxx.xxx.xxx:29999 received

15/05/22 15:28:26 INFO Executor: Finished task 1.0 in stage 1.0 (TID 3). 1830 bytes result sent to driver

15/05/22 15:28:26 INFO Executor: Finished task 0.0 in stage 1.0 (TID 2). 1830 bytes result sent to driver

15/05/22 15:28:26 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 142 ms on localhost (1/2)

15/05/22 15:28:26 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 143 ms on localhost (2/2)

15/05/22 15:28:26 INFO DAGScheduler: ResultStage 1 (count at <console>:24) finished in 0.143 s

15/05/22 15:28:26 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool

15/05/22 15:28:26 INFO DAGScheduler: Job 1 finished: count at <console>:24, took 0.150284 s

res2: Long = 45

I don't understand the exception message because I do have a Worker located with the Master on the same node (btw, why we have to have a local worker?!!)

I thought it's just an warning-like exception, so I go ahead to execute the next command (saveAsTextFile), and I get:

scala> s.saveAsTextFile("tachyon://xxx.xxx.xxx.xxx:19998/Y")

15/05/22 15:29:26 INFO : getWorkingDirectory: /

15/05/22 15:29:26 INFO : getFileStatus(tachyon://xxx.xxx.xxx.xxx:19998/Y): HDFS Path: hdfs://xxx.xxx.xxx.xxx:9000/Y TPath: tachyon://xxx.xxx.xxx.xxx:19998/Y

java.io.IOException: Failed on local exception: java.io.EOFException; Host Details : local host is: "tianyin-h8-1160t.ucsd.edu/127.0.1.1"; destination host is: "xxx.xxx.xxx.xxx":9000;

at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)

at org.apache.hadoop.ipc.Client.call(Client.java:1351)

at org.apache.hadoop.ipc.Client.call(Client.java:1300)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)

at com.sun.proxy.$Proxy18.getFileInfo(Unknown Source)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)

at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)

at com.sun.proxy.$Proxy18.getFileInfo(Unknown Source)

at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)

at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)

at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)

at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)

at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)

at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1397)

at tachyon.hadoop.AbstractTFS.fromHdfsToTachyon(AbstractTFS.java:236)

at tachyon.hadoop.AbstractTFS.getFileStatus(AbstractTFS.java:302)

at tachyon.hadoop.TFS.getFileStatus(TFS.java:25)

at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1397)

at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:131)

at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1089)

at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1065)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)

at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)

at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1065)

at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply$mcV$sp(PairRDDFunctions.scala:989)

at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:965)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)

at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)

at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:965)

at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply$mcV$sp(PairRDDFunctions.scala:897)

at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:897)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)

at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)

at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:896)

at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply$mcV$sp(RDD.scala:1400)

at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1379)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)

at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)

at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1379)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:29)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31)

at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:33)

at $iwC$$iwC$$iwC$$iwC.<init>(<console>:35)

at $iwC$$iwC$$iwC.<init>(<console>:37)

at $iwC$$iwC.<init>(<console>:39)

at $iwC.<init>(<console>:41)

at <init>(<console>:43)

at .<init>(<console>:47)

at .<clinit>(<console>)

at .<init>(<console>:7)

at .<clinit>(<console>)

at $print(<console>)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)

at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)

at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)

at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)

at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)

at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)

at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)

at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)

at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)

at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)

at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)

at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)

at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)

at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)

at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)

at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)

at org.apache.spark.repl.Main$.main(Main.scala:31)

at org.apache.spark.repl.Main.main(Main.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)

at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)

at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Caused by: java.io.EOFException

at java.io.DataInputStream.readInt(DataInputStream.java:392)

at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:995)

at org.apache.hadoop.ipc.Client$Connection.run(Client.java:891)

Thanks for looking into the problem. Any help is highly appreciated!

Best,

Tianyin

Tianyin Xu

unread,

May 22, 2015, 7:55:43 PM5/22/15

to tachyo...@googlegroups.com

I get the cause of the first exception. It means I should run Spark on the node where Tachyon is located.

I run the Spark shell on the same node and there's no error, but the 2nd exception still occurs as follows :-(

Any thoughts?

scala> s.saveAsTextFile("tachyon://xxx.xxx.xxx.xxx:19998/Z")

15/05/22 16:51:26 INFO : getWorkingDirectory: /

15/05/22 16:51:26 INFO : getFileStatus(tachyon://xxx.xxx.xxx.xxx:19998/Z): HDFS Path: hdfs://xxx.xxx.xxx.xxx:9000/Z TPath: tachyon://xxx.xxx.xxx.xxx:19998/Z

java.io.IOException: Failed on local exception: java.io.EOFException; Host Details : local host is: "ccied8/xxx.xxx.xxx.xxx"; destination host is: "ccied8.sysnet.ucsd.edu":9000;

Thanks!

T

Calvin Jia

unread,

May 24, 2015, 12:39:01 AM5/24/15

to tachyo...@googlegroups.com, ti...@eng.ucsd.edu

Hi,

What hadoop version did you compile Tachyon with (ie -Dhadoop.version)? I think the error is caused by a hadoop client/server version mismatch.

Hope this helps,

Calvin

Tianyin Xu

unread,

May 24, 2015, 4:11:30 AM5/24/15

to Calvin Jia, tachyo...@googlegroups.com

Thanks for the reply, Calvin!

I didn't compile Tachyon but directly used

https://github.com/amplab/tachyon/releases/download/v0.6.4/tachyon-0.6.4-bin.tar.gz

I checked the pom.xml and found it supports

<hadoop.version>1.0.4</hadoop.version>

so I setup a Hadoop 1.0.4.

It may not be a client/server version mismatch which would prevent Tachyon from starting.

But you are right that the exception was thrown when writing to HDFS. As I set TACHYON_UNDERFS_ADDRESS to a local directory, it can run without error.

~t

--
You received this message because you are subscribed to a topic in the Google Groups "Tachyon Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tachyon-users/bn39VN5M7P8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tachyon-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Tianyin XU,
http://cseweb.ucsd.edu/~tixu/

Calvin Jia

unread,

May 25, 2015, 11:41:44 PM5/25/15

to tachyo...@googlegroups.com, ti...@cs.ucsd.edu, ti...@cs.ucsd.edu

Hi,

Did you compile Spark with hadoop 1.0.4?

Thanks,

Calvin

Tianyin Xu

unread,

May 26, 2015, 1:03:45 AM5/26/15

to Calvin Jia, tachyo...@googlegroups.com

No... I didn't... Oh, this is the reason? :-)

I thought with Tachyon, Spark does not directly talk to HDFS but let Tachyon talk to HDFS.

~t

--

You received this message because you are subscribed to a topic in the Google Groups "Tachyon Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tachyon-users/bn39VN5M7P8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tachyon-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Calvin Jia

unread,

May 26, 2015, 1:51:53 PM5/26/15

to tachyo...@googlegroups.com, ti...@cs.ucsd.edu, ti...@cs.ucsd.edu

Tachyon is still using the under filesystem (in this case HDFS). The tachyon-client in Spark uses Spark's version of hadoop-client in order to do this, which is why you would need to recompile Spark with the appropriate version of Hadoop.

Hope this helps,

Calvin

Tianyin Xu

unread,

May 26, 2015, 10:33:10 PM5/26/15

to Calvin Jia, tachyo...@googlegroups.com

Hi, Calvin,

I recompiled Spark and the error is gone. But I hit another error as follows.

From the previous post,
https://groups.google.com/forum/#!topic/tachyon-users/NmuDVWotAyM

The FailedToCheckpointException is caused by a bug, but the bug should have been fixed long ago.

Another post saying it's a permission issue,
https://groups.google.com/forum/#!topic/tachyon-users/wbTUgEzU1yY

But I cannot find where the permission is wrong.

Could you point me where shall I check?

Thanks so much!

Tianyin

scala> val s = sc.textFile("tachyon://xxx.xxx.xxx:19998/X")

//successful

scala> s.count()
//successful

scala> s.saveAsTextFile("tachyon://xxx.xxx.xxx.xxx:19998/Z")

15/05/26 19:06:48 INFO : getWorkingDirectory: /
15/05/26 19:06:48 INFO : getWorkingDirectory: /
15/05/26 19:06:48 INFO : getFileStatus(tachyon://xxx.xxx.xxx.xxx:19998/Z): HDFS Path: hdfs://xxx.xxx.xxx.xxx:9000/Z TPath: tachyon://xxx.xxx.xxx.xxx:19998/Z
15/05/26 19:06:48 INFO : File does not exist: tachyon://xxx.xxx.xxx.xxx:19998/Z
15/05/26 19:06:48 INFO : getWorkingDirectory: /
15/05/26 19:06:48 INFO : mkdirs(tachyon://xxx.xxx.xxx.xxx:19998/Z/_temporary/0, rwxrwxrwx)
15/05/26 19:06:48 INFO SparkContext: Starting job: saveAsTextFile at <console>:24
15/05/26 19:06:48 INFO DAGScheduler: Got job 1 (saveAsTextFile at <console>:24) with 2 output partitions (allowLocal=false)
15/05/26 19:06:48 INFO DAGScheduler: Final stage: ResultStage 1(saveAsTextFile at <console>:24)
15/05/26 19:06:48 INFO DAGScheduler: Parents of final stage: List()
15/05/26 19:06:48 INFO DAGScheduler: Missing parents: List()
15/05/26 19:06:48 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[3] at saveAsTextFile at <console>:24), which has no missing parents
15/05/26 19:06:49 INFO MemoryStore: ensureFreeSpace(118424) called with curMem=162949, maxMem=278302556
15/05/26 19:06:49 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 115.6 KB, free 265.1 MB)
15/05/26 19:06:49 INFO MemoryStore: ensureFreeSpace(39835) called with curMem=281373, maxMem=278302556
15/05/26 19:06:49 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 38.9 KB, free 265.1 MB)
15/05/26 19:06:49 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:59687 (size: 38.9 KB, free: 265.4 MB)
15/05/26 19:06:49 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:874
15/05/26 19:06:49 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (MapPartitionsRDD[3] at saveAsTextFile at <console>:24)
15/05/26 19:06:49 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
15/05/26 19:06:49 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, localhost, ANY, 1406 bytes)
15/05/26 19:06:49 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, localhost, ANY, 1406 bytes)
15/05/26 19:06:49 INFO Executor: Running task 0.0 in stage 1.0 (TID 2)
15/05/26 19:06:49 INFO Executor: Running task 1.0 in stage 1.0 (TID 3)
15/05/26 19:06:49 INFO HadoopRDD: Input split: tachyon://xxx.xxx.xxx.xxx:19998/X/X:7729+7729
15/05/26 19:06:49 INFO : open(tachyon://xxx.xxx.xxx.xxx:19998/X/X, 65536)
15/05/26 19:06:49 INFO HadoopRDD: Input split: tachyon://xxx.xxx.xxx.xxx:19998/X/X:0+7729
15/05/26 19:06:49 INFO : getWorkingDirectory: /
15/05/26 19:06:49 INFO : open(tachyon://xxx.xxx.xxx.xxx:19998/X/X, 65536)
15/05/26 19:06:49 INFO : create(tachyon://xxx.xxx.xxx.xxx:19998/Z/_temporary/0/_temporary/attempt_201505261906_0001_m_000001_3/part-00001, rw-r--r--, true, 65536, 1, 536870912, org.apache.hadoop.mapred.Reporter$1@18fee08d)
15/05/26 19:06:49 INFO : getWorkingDirectory: /
15/05/26 19:06:49 INFO : create(tachyon://xxx.xxx.xxx.xxx:19998/Z/_temporary/0/_temporary/attempt_201505261906_0001_m_000000_2/part-00000, rw-r--r--, true, 65536, 1, 536870912, org.apache.hadoop.mapred.Reporter$1@18fee08d)
15/05/26 19:06:49 INFO : /mnt/tachyon/ramdisk/tachyonworker/users/4/18253611008 was created!
15/05/26 19:06:49 INFO : /mnt/tachyon/ramdisk/tachyonworker/users/4/20401094656 was created!
15/05/26 19:06:49 ERROR Executor: Exception in task 1.0 in stage 1.0 (TID 3)
java.io.IOException: FailedToCheckpointException(message:Failed to rename hdfs://xxx.xxx.xxx.xxx:9000/tmp/tachyon/workers/1432692000002/4/17 to hdfs://xxx.xxx.xxx.xxx:9000/tmp/tachyon/data/17)
    at tachyon.worker.WorkerClient.addCheckpoint(WorkerClient.java:116)
    at tachyon.client.TachyonFS.addCheckpoint(TachyonFS.java:183)
    at tachyon.client.FileOutStream.close(FileOutStream.java:104)
    at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
    at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
    at org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:108)
    at org.apache.spark.SparkHadoopWriter.close(SparkHadoopWriter.scala:102)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7.apply$mcV$sp(PairRDDFunctions.scala:1117)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1286)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1116)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095)

    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
    at org.apache.spark.scheduler.Task.run(Task.scala:70)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

Caused by: FailedToCheckpointException(message:Failed to rename hdfs://xxx.xxx.xxx.xxx:9000/tmp/tachyon/workers/1432692000002/4/17 to hdfs://xxx.xxx.xxx.xxx:9000/tmp/tachyon/data/17)
    at tachyon.thrift.WorkerService$addCheckpoint_result$addCheckpoint_resultStandardScheme.read(WorkerService.java:3513)
    at tachyon.thrift.WorkerService$addCheckpoint_result$addCheckpoint_resultStandardScheme.read(WorkerService.java:3481)
    at tachyon.thrift.WorkerService$addCheckpoint_result.read(WorkerService.java:3407)
    at tachyon.org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
    at tachyon.thrift.WorkerService$Client.recv_addCheckpoint(WorkerService.java:219)
    at tachyon.thrift.WorkerService$Client.addCheckpoint(WorkerService.java:205)
    at tachyon.worker.WorkerClient.addCheckpoint(WorkerClient.java:110)
    ... 16 more
15/05/26 19:06:49 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 2)
java.io.IOException: FailedToCheckpointException(message:Failed to rename hdfs://xxx.xxx.xxx.xxx:9000/tmp/tachyon/workers/1432692000002/4/19 to hdfs://xxx.xxx.xxx.xxx:9000/tmp/tachyon/data/19)
    at tachyon.worker.WorkerClient.addCheckpoint(WorkerClient.java:116)
    at tachyon.client.TachyonFS.addCheckpoint(TachyonFS.java:183)
    at tachyon.client.FileOutStream.close(FileOutStream.java:104)
    at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
    at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
    at org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:108)
    at org.apache.spark.SparkHadoopWriter.close(SparkHadoopWriter.scala:102)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7.apply$mcV$sp(PairRDDFunctions.scala:1117)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1286)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1116)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095)

    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
    at org.apache.spark.scheduler.Task.run(Task.scala:70)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

Caused by: FailedToCheckpointException(message:Failed to rename hdfs://xxx.xxx.xxx.xxx:9000/tmp/tachyon/workers/1432692000002/4/19 to hdfs://xxx.xxx.xxx.xxx:9000/tmp/tachyon/data/19)
    at tachyon.thrift.WorkerService$addCheckpoint_result$addCheckpoint_resultStandardScheme.read(WorkerService.java:3513)
    at tachyon.thrift.WorkerService$addCheckpoint_result$addCheckpoint_resultStandardScheme.read(WorkerService.java:3481)
    at tachyon.thrift.WorkerService$addCheckpoint_result.read(WorkerService.java:3407)
    at tachyon.org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
    at tachyon.thrift.WorkerService$Client.recv_addCheckpoint(WorkerService.java:219)
    at tachyon.thrift.WorkerService$Client.addCheckpoint(WorkerService.java:205)
    at tachyon.worker.WorkerClient.addCheckpoint(WorkerClient.java:110)
    ... 16 more
15/05/26 19:06:49 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 3, localhost): java.io.IOException: FailedToCheckpointException(message:Failed to rename hdfs://xxx.xxx.xxx.xxx:9000/tmp/tachyon/workers/1432692000002/4/17 to hdfs://xxx.xxx.xxx.xxx:9000/tmp/tachyon/data/17)
    at tachyon.worker.WorkerClient.addCheckpoint(WorkerClient.java:116)
    at tachyon.client.TachyonFS.addCheckpoint(TachyonFS.java:183)
    at tachyon.client.FileOutStream.close(FileOutStream.java:104)
    at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
    at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
    at org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:108)
    at org.apache.spark.SparkHadoopWriter.close(SparkHadoopWriter.scala:102)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7.apply$mcV$sp(PairRDDFunctions.scala:1117)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1286)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1116)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095)

    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
    at org.apache.spark.scheduler.Task.run(Task.scala:70)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

Caused by: FailedToCheckpointException(message:Failed to rename hdfs://xxx.xxx.xxx.xxx:9000/tmp/tachyon/workers/1432692000002/4/17 to hdfs://xxx.xxx.xxx.xxx:9000/tmp/tachyon/data/17)
    at tachyon.thrift.WorkerService$addCheckpoint_result$addCheckpoint_resultStandardScheme.read(WorkerService.java:3513)
    at tachyon.thrift.WorkerService$addCheckpoint_result$addCheckpoint_resultStandardScheme.read(WorkerService.java:3481)
    at tachyon.thrift.WorkerService$addCheckpoint_result.read(WorkerService.java:3407)
    at tachyon.org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
    at tachyon.thrift.WorkerService$Client.recv_addCheckpoint(WorkerService.java:219)
    at tachyon.thrift.WorkerService$Client.addCheckpoint(WorkerService.java:205)
    at tachyon.worker.WorkerClient.addCheckpoint(WorkerClient.java:110)
    ... 16 more

15/05/26 19:06:49 ERROR TaskSetManager: Task 1 in stage 1.0 failed 1 times; aborting job
15/05/26 19:06:49 INFO TaskSchedulerImpl: Cancelling stage 1
15/05/26 19:06:49 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
15/05/26 19:06:49 INFO TaskSchedulerImpl: Stage 1 was cancelled
15/05/26 19:06:49 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 2, localhost): java.io.IOException: FailedToCheckpointException(message:Failed to rename hdfs://xxx.xxx.xxx.xxx:9000/tmp/tachyon/workers/1432692000002/4/19 to hdfs://xxx.xxx.xxx.xxx:9000/tmp/tachyon/data/19)
    at tachyon.worker.WorkerClient.addCheckpoint(WorkerClient.java:116)
    at tachyon.client.TachyonFS.addCheckpoint(TachyonFS.java:183)
    at tachyon.client.FileOutStream.close(FileOutStream.java:104)
    at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
    at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
    at org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:108)
    at org.apache.spark.SparkHadoopWriter.close(SparkHadoopWriter.scala:102)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7.apply$mcV$sp(PairRDDFunctions.scala:1117)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1286)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1116)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095)

    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
    at org.apache.spark.scheduler.Task.run(Task.scala:70)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

Caused by: FailedToCheckpointException(message:Failed to rename hdfs://xxx.xxx.xxx.xxx:9000/tmp/tachyon/workers/1432692000002/4/19 to hdfs://xxx.xxx.xxx.xxx:9000/tmp/tachyon/data/19)
    at tachyon.thrift.WorkerService$addCheckpoint_result$addCheckpoint_resultStandardScheme.read(WorkerService.java:3513)
    at tachyon.thrift.WorkerService$addCheckpoint_result$addCheckpoint_resultStandardScheme.read(WorkerService.java:3481)
    at tachyon.thrift.WorkerService$addCheckpoint_result.read(WorkerService.java:3407)
    at tachyon.org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
    at tachyon.thrift.WorkerService$Client.recv_addCheckpoint(WorkerService.java:219)
    at tachyon.thrift.WorkerService$Client.addCheckpoint(WorkerService.java:205)
    at tachyon.worker.WorkerClient.addCheckpoint(WorkerClient.java:110)
    ... 16 more

15/05/26 19:06:49 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
15/05/26 19:06:49 INFO DAGScheduler: ResultStage 1 (saveAsTextFile at <console>:24) failed in 0.399 s
15/05/26 19:06:49 INFO DAGScheduler: Job 1 failed: saveAsTextFile at <console>:24, took 0.506026 s
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 1.0 failed 1 times, most recent failure: Lost task 1.0 in stage 1.0 (TID 3, localhost): java.io.IOException: FailedToCheckpointException(message:Failed to rename hdfs://xxx.xxx.xxx.xxx:9000/tmp/tachyon/workers/1432692000002/4/17 to hdfs://xxx.xxx.xxx.xxx:9000/tmp/tachyon/data/17)
    at tachyon.worker.WorkerClient.addCheckpoint(WorkerClient.java:116)
    at tachyon.client.TachyonFS.addCheckpoint(TachyonFS.java:183)
    at tachyon.client.FileOutStream.close(FileOutStream.java:104)
    at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
    at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
    at org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:108)
    at org.apache.spark.SparkHadoopWriter.close(SparkHadoopWriter.scala:102)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7.apply$mcV$sp(PairRDDFunctions.scala:1117)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1286)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1116)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095)

    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
    at org.apache.spark.scheduler.Task.run(Task.scala:70)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

Caused by: FailedToCheckpointException(message:Failed to rename hdfs://xxx.xxx.xxx.xxx:9000/tmp/tachyon/workers/1432692000002/4/17 to hdfs://xxx.xxx.xxx.xxx:9000/tmp/tachyon/data/17)
    at tachyon.thrift.WorkerService$addCheckpoint_result$addCheckpoint_resultStandardScheme.read(WorkerService.java:3513)
    at tachyon.thrift.WorkerService$addCheckpoint_result$addCheckpoint_resultStandardScheme.read(WorkerService.java:3481)
    at tachyon.thrift.WorkerService$addCheckpoint_result.read(WorkerService.java:3407)
    at tachyon.org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
    at tachyon.thrift.WorkerService$Client.recv_addCheckpoint(WorkerService.java:219)
    at tachyon.thrift.WorkerService$Client.addCheckpoint(WorkerService.java:205)
    at tachyon.worker.WorkerClient.addCheckpoint(WorkerClient.java:110)
    ... 16 more

Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1266)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1257)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1256)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1256)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1450)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

--

You received this message because you are subscribed to a topic in the Google Groups "Tachyon Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tachyon-users/bn39VN5M7P8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tachyon-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Calvin Jia

unread,

May 27, 2015, 2:00:24 PM5/27/15

to tachyo...@googlegroups.com, ti...@cs.ucsd.edu, ti...@cs.ucsd.edu

Hi,

I think this has to do with not formatting correctly, perhaps try formatting Tachyon again?

Thanks,

Calvin

Gene Pang

unread,

May 29, 2015, 6:30:30 PM5/29/15

to tachyo...@googlegroups.com, jia.c...@gmail.com, ti...@cs.ucsd.edu

Hi Tianyin,

Was Calvin's suggestion able to solve this problem?

Thanks,

Gene

Tianyin Xu

unread,

May 29, 2015, 6:47:24 PM5/29/15

to Gene Pang, tachyo...@googlegroups.com, Calvin Jia

YES! I just tried and it worked :-)

Thanks a lot, Calvin and Gene!

Have a great weekend!

Tianyin

--

You received this message because you are subscribed to a topic in the Google Groups "Tachyon Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tachyon-users/bn39VN5M7P8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tachyon-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Calvin Jia

unread,

May 29, 2015, 8:08:13 PM5/29/15

to tachyo...@googlegroups.com, ti...@eng.ucsd.edu

Awesome! Thanks for verifying the fix, it is very helpful for future reference :)

Reply all

Reply to author

Forward