"Put a file X into HDFS and run the Spark shell:
$ ./spark-shell
$ val s = sc.textFile("tachyon://localhost:19998/X")
$ s.count()
$ s.saveAsTextFile("tachyon://localhost:19998/Y")
"
I am confused about "Put a file X into HDFS", what is the exact definition of "HDFS" here, what does it refer to?
Does Tachyon assume I am using Hadoop HDFS?
If yes, how can I connect Tachyon to HDFS, I did not see instructions about this.
If not, say "HDFS" may actually refer to TFS. So I put a file "README.md" to TFS through command line interface.
Then I tried running the s.count(), Spark shell just continued showing me Error:
(I am pretty sure that I have missed a very basic concept
BTW, I understand the document will not babysitting everything, but it would be very helpful for beginners if it is more intuitive without any assumption)
Will dig this tomorrow, any hint would be highly appreciated.
--------------------------------------------------------------------------------------------------------------------------------------
5/09/24 00:07:35 ERROR :
tachyon.org.apache.thrift.transport.TTransportException
at tachyon.org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at tachyon.org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at tachyon.org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
at tachyon.org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
at tachyon.org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at tachyon.org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:362)
at tachyon.org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:284)
at tachyon.org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:191)
at tachyon.org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at tachyon.thrift.MasterService$Client.recv_user_getUserId(MasterService.java:715)
at tachyon.thrift.MasterService$Client.user_getUserId(MasterService.java:703)
at tachyon.master.MasterClient.connect(MasterClient.java:210)
at tachyon.master.MasterClient.user_getUfsAddress(MasterClient.java:641)
at tachyon.client.TachyonFS.getUfsAddress(TachyonFS.java:681)
at tachyon.hadoop.AbstractTFS.initialize(AbstractTFS.java:402)
at tachyon.hadoop.TFS.initialize(TFS.java:26)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:256)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1910)
at org.apache.spark.rdd.RDD.count(RDD.scala:1121)
at $line19.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24)
at $line19.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:29)
at $line19.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
at $line19.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:33)
at $line19.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:35)
at $line19.$read$$iwC$$iwC$$iwC.<init>(<console>:37)
at $line19.$read$$iwC$$iwC.<init>(<console>:39)
at $line19.$read$$iwC.<init>(<console>:41)
at $line19.$read.<init>(<console>:43)
at $line19.$read$.<init>(<console>:47)
at $line19.$read$.<clinit>(<console>)
at $line19.$eval$.<init>(<console>:7)
at $line19.$eval$.<clinit>(<console>)
at $line19.$eval.$print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
15/09/24 00:07:36 ERROR :
tachyon.org.apache.thrift.transport.TTransportException
at tachyon.org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)