I'm new to Tachyon (so sorry for the newbie question). I'm trying to run Spark on top of Tachyon backed with HDFS, and I follow the instructions on the official sites.
I have setup the Tachyon cluster on a 3-node cluster with HDFS and successfully the following tests:
15/05/22 15:28:26 INFO SparkContext: Starting job: count at <console>:24
15/05/22 15:28:26 INFO DAGScheduler: Got job 1 (count at <console>:24) with 2 output partitions (allowLocal=false)
15/05/22 15:28:26 INFO DAGScheduler: Final stage: ResultStage 1(count at <console>:24)
15/05/22 15:28:26 INFO DAGScheduler: Parents of final stage: List()
15/05/22 15:28:26 INFO DAGScheduler: Missing parents: List()
15/05/22 15:28:26 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[1] at textFile at <console>:21), which has no missing parents
15/05/22 15:28:26 INFO MemoryStore: ensureFreeSpace(3000) called with curMem=121375, maxMem=278302556
15/05/22 15:28:26 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.9 KB, free 265.3 MB)
15/05/22 15:28:26 INFO MemoryStore: ensureFreeSpace(1792) called with curMem=124375, maxMem=278302556
15/05/22 15:28:26 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1792.0 B, free 265.3 MB)
15/05/22 15:28:26 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:37746 (size: 1792.0 B, free: 265.4 MB)
15/05/22 15:28:26 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:874
15/05/22 15:28:26 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (MapPartitionsRDD[1] at textFile at <console>:21)
15/05/22 15:28:26 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
15/05/22 15:28:26 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, localhost, ANY, 1421 bytes)
15/05/22 15:28:26 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, localhost, ANY, 1421 bytes)
15/05/22 15:28:26 INFO Executor: Running task 0.0 in stage 1.0 (TID 2)
15/05/22 15:28:26 INFO Executor: Running task 1.0 in stage 1.0 (TID 3)
15/05/22 15:28:26 INFO HadoopRDD: Input split: tachyon://xxx.xxx.xxx.xxx:19998/README.md:0+614
15/05/22 15:28:26 INFO HadoopRDD: Input split: tachyon://xxx.xxx.xxx.xxx:19998/README.md:614+614
15/05/22 15:28:26 INFO : open(tachyon://xxx.xxx.xxx.xxx:19998/README.md, 65536)
15/05/22 15:28:26 INFO : open(tachyon://xxx.xxx.xxx.xxx:19998/README.md, 65536)
15/05/22 15:28:26 WARN : Recache attempt failed.
java.io.IOException: The machine does not have any local worker.
at tachyon.client.BlockOutStream.<init>(BlockOutStream.java:94)
at tachyon.client.BlockOutStream.<init>(BlockOutStream.java:65)
at tachyon.client.RemoteBlockInStream.read(RemoteBlockInStream.java:204)
at tachyon.hadoop.HdfsFileInputStream.read(HdfsFileInputStream.java:142)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:211)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:206)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:45)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:248)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:216)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1618)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1095)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1095)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/05/22 15:28:26 INFO : Try to find remote worker and read block 209379655680 from 0, with len 1228
15/05/22 15:28:26 INFO : Block locations:[NetAddress(mHost:xxx.xxx.xxx.xxx, mPort:29998, mSecondaryPort:29999)]
15/05/22 15:28:26 INFO : Try to find remote worker and read block 209379655680 from 614, with len 614
15/05/22 15:28:26 INFO : Block locations:[NetAddress(mHost:xxx.xxx.xxx.xxx, mPort:29998, mSecondaryPort:29999)]
15/05/22 15:28:26 INFO : xxx.xxx.xxx.xxx:29999 current host is
tianyin.ucsd.edu 132.239.17.127
15/05/22 15:28:26 INFO : xxx.xxx.xxx.xxx:29999 current host is
tianyin.ucsd.edu 132.239.17.127
15/05/22 15:28:26 INFO : Connected to remote machine /xxx.xxx.xxx.xxx:29999 sent
15/05/22 15:28:26 INFO : Connected to remote machine /xxx.xxx.xxx.xxx:29999 sent
15/05/22 15:28:26 INFO : Data 209379655680 to remote machine /xxx.xxx.xxx.xxx:29999 sent
15/05/22 15:28:26 INFO : Data 209379655680 to remote machine /xxx.xxx.xxx.xxx:29999 sent
15/05/22 15:28:26 INFO : data java.nio.HeapByteBuffer[pos=0 lim=614 cap=614], blockId:209379655680 offset:614 dataLength:614
15/05/22 15:28:26 INFO : data java.nio.HeapByteBuffer[pos=0 lim=1228 cap=1228], blockId:209379655680 offset:0 dataLength:1228
15/05/22 15:28:26 INFO : Data 209379655680 from remote machine /xxx.xxx.xxx.xxx:29999 received
15/05/22 15:28:26 INFO : Data 209379655680 from remote machine /xxx.xxx.xxx.xxx:29999 received
15/05/22 15:28:26 INFO Executor: Finished task 1.0 in stage 1.0 (TID 3). 1830 bytes result sent to driver
15/05/22 15:28:26 INFO Executor: Finished task 0.0 in stage 1.0 (TID 2). 1830 bytes result sent to driver
15/05/22 15:28:26 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 142 ms on localhost (1/2)
15/05/22 15:28:26 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 143 ms on localhost (2/2)
15/05/22 15:28:26 INFO DAGScheduler: ResultStage 1 (count at <console>:24) finished in 0.143 s
15/05/22 15:28:26 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
15/05/22 15:28:26 INFO DAGScheduler: Job 1 finished: count at <console>:24, took 0.150284 s
res2: Long = 45
I thought it's just an warning-like exception, so I go ahead to execute the next command (saveAsTextFile), and I get:
scala> s.saveAsTextFile("tachyon://xxx.xxx.xxx.xxx:19998/Y")
15/05/22 15:29:26 INFO : getWorkingDirectory: /
15/05/22 15:29:26 INFO : getWorkingDirectory: /
15/05/22 15:29:26 INFO : getFileStatus(tachyon://xxx.xxx.xxx.xxx:19998/Y): HDFS Path: hdfs://xxx.xxx.xxx.xxx:9000/Y TPath: tachyon://xxx.xxx.xxx.xxx:19998/Y
java.io.IOException: Failed on local exception: java.io.EOFException; Host Details : local host is: "tianyin-h8-1160t.ucsd.edu/127.0.1.1"; destination host is: "xxx.xxx.xxx.xxx":9000; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
at org.apache.hadoop.ipc.Client.call(Client.java:1351)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy18.getFileInfo(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy18.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1397)
at tachyon.hadoop.AbstractTFS.fromHdfsToTachyon(AbstractTFS.java:236)
at tachyon.hadoop.AbstractTFS.getFileStatus(AbstractTFS.java:302)
at tachyon.hadoop.TFS.getFileStatus(TFS.java:25)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1397)
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:131)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1089)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1065)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1065)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1065)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply$mcV$sp(PairRDDFunctions.scala:989)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:965)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:965)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:965)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply$mcV$sp(PairRDDFunctions.scala:897)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:897)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:897)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:896)
at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply$mcV$sp(RDD.scala:1400)
at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1379)
at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1379)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1379)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:29)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:33)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:35)
at $iwC$$iwC$$iwC.<init>(<console>:37)
at $iwC$$iwC.<init>(<console>:39)
at $iwC.<init>(<console>:41)
at <init>(<console>:43)
at .<init>(<console>:47)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:995)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:891)
Thanks for looking into the problem. Any help is highly appreciated!