fail to run simple test on Spark

Charlie Chen

unread,

Sep 24, 2015, 4:00:11 AM9/24/15

to Tachyon Users

I am completely new to Tachyon and Spark, could be a silly question here:

So after following all the instructions to install Tachyon and passing the selftest.

Then I tried this: http://tachyon-project.org/documentation/Running-Spark-on-Tachyon.html:

"Put a file X into HDFS and run the Spark shell:

$ ./spark-shell
$ val s = sc.textFile("tachyon://localhost:19998/X")
$ s.count()
$ s.saveAsTextFile("tachyon://localhost:19998/Y")

"

I am confused about "Put a file X into HDFS", what is the exact definition of "HDFS" here, what does it refer to?

Does Tachyon assume I am using Hadoop HDFS?

If yes, how can I connect Tachyon to HDFS, I did not see instructions about this.

If not, say "HDFS" may actually refer to TFS. So I put a file "README.md" to TFS through command line interface.

Then I tried running the s.count(), Spark shell just continued showing me Error:

(I am pretty sure that I have missed a very basic concept

BTW, I understand the document will not babysitting everything, but it would be very helpful for beginners if it is more intuitive without any assumption)

Will dig this tomorrow, any hint would be highly appreciated.

--------------------------------------------------------------------------------------------------------------------------------------

5/09/24 00:07:35 ERROR :

tachyon.org.apache.thrift.transport.TTransportException

at tachyon.org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)

at tachyon.org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)

at tachyon.org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)

at tachyon.org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)

at tachyon.org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)

at tachyon.org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:362)

at tachyon.org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:284)

at tachyon.org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:191)

at tachyon.org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)

at tachyon.thrift.MasterService$Client.recv_user_getUserId(MasterService.java:715)

at tachyon.thrift.MasterService$Client.user_getUserId(MasterService.java:703)

at tachyon.master.MasterClient.connect(MasterClient.java:210)

at tachyon.master.MasterClient.user_getUfsAddress(MasterClient.java:641)

at tachyon.client.TachyonFS.getUfsAddress(TachyonFS.java:681)

at tachyon.hadoop.AbstractTFS.initialize(AbstractTFS.java:402)

at tachyon.hadoop.TFS.initialize(TFS.java:26)

at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)

at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)

at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)

at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)

at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)

at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:256)

at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)

at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)

at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)

at scala.Option.getOrElse(Option.scala:120)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)

at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)

at scala.Option.getOrElse(Option.scala:120)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)

at org.apache.spark.SparkContext.runJob(SparkContext.scala:1910)

at org.apache.spark.rdd.RDD.count(RDD.scala:1121)

at $line19.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24)

at $line19.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:29)

at $line19.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31)

at $line19.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:33)

at $line19.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:35)

at $line19.$read$$iwC$$iwC$$iwC.<init>(<console>:37)

at $line19.$read$$iwC$$iwC.<init>(<console>:39)

at $line19.$read$$iwC.<init>(<console>:41)

at $line19.$read.<init>(<console>:43)

at $line19.$read$.<init>(<console>:47)

at $line19.$read$.<clinit>(<console>)

at $line19.$eval$.<init>(<console>:7)

at $line19.$eval$.<clinit>(<console>)

at $line19.$eval.$print(<console>)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:497)

at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)

at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)

at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)

at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)

at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)

at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)

at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)

at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)

at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)

at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)

at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)

at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)

at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)

at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)

at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)

at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)

at org.apache.spark.repl.Main$.main(Main.scala:31)

at org.apache.spark.repl.Main.main(Main.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:497)

at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)

at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)

at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

15/09/24 00:07:36 ERROR :

tachyon.org.apache.thrift.transport.TTransportException

at tachyon.org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)

Gene Pang

unread,

Sep 24, 2015, 10:21:24 AM9/24/15

to Tachyon Users

Hi Charlie,

Unfortunately, the documentation is not complete. Yes, Tachyon is usually deployed on top of an "under file system" (UFS). There are several options for UFS that work with Tachyon, but HDFS is a common choice. In order to configure HDFS as your UFS for Tachyon, you can edit your tachyon-env.sh file, and change:

export TACHYON_UNDERFS_ADDRESS=$TACHYON_HOME/underfs

to point to your HDFS cluster (an example: export TACHYON_UNDERFS_ADDRESS=hdfs://localhost:9000)

This would "connect" Tachyon with HDFS as the UFS.

I hope this helps,

Gene

Charlie Chen

unread,

Sep 24, 2015, 1:40:49 PM9/24/15

to Tachyon Users

Hi, Gene:

Still get the error after the modification as you suggest.

I suspect this is because of incompatible Hadoop Version? I am using HDFS 2.7.1 and Tachyon 0.8.0-SNAPSHOT

Shall I use Hadoop 2.6.0, seems Hadoop 1.0.4 and Hadoop 2.6.0 has been testes.

Or there is a switch between Hadoop 2.x and Hadoop 1.x?

Error info in master.log:

2015-09-24 10:14:08,039 ERROR MASTER_LOGGER (HdfsUnderFileSystem.java:<init>) - Exception thrown when trying to get FileSystem for hdfs://localhost:9000

org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4

at org.apache.hadoop.ipc.Client.call(Client.java:1070)

at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)

at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source)

at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)

at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)

at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)

at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)

at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)

at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)

at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)

at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)

at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)

at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)

at tachyon.underfs.hdfs.HdfsUnderFileSystem.<init>(HdfsUnderFileSystem.java:74)

at tachyon.underfs.hdfs.HdfsUnderFileSystemFactory.create(HdfsUnderFileSystemFactory.java:30)

at tachyon.underfs.UnderFileSystemRegistry.create(UnderFileSystemRegistry.java:116)

at tachyon.underfs.UnderFileSystem.get(UnderFileSystem.java:100)

at tachyon.underfs.UnderFileSystem.get(UnderFileSystem.java:83)

at tachyon.master.TachyonMaster.connectToUFS(TachyonMaster.java:382)

at tachyon.master.TachyonMaster.startMasters(TachyonMaster.java:269)

at tachyon.master.TachyonMaster.start(TachyonMaster.java:250)

at tachyon.master.TachyonMaster.main(TachyonMaster.java:62)

2015-09-24 10:14:08,043 ERROR MASTER_LOGGER (TachyonMaster.java:main) - Uncaught exception terminating Master

java.lang.IllegalArgumentException: All eligible Under File Systems were unable to create an instance for the given path: hdfs://localhost:9000

java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4

Thanks.

Charlie

在 2015年9月24日星期四 UTC-7上午7:21:24，Gene Pang写道：

Gene Pang

unread,

Sep 24, 2015, 3:06:19 PM9/24/15

to Tachyon Users

Hi Charlie,

I think you have to re-compile Tachyon with the correct version of hadoop you are using.

http://tachyon-project.org/documentation/Building-Tachyon-Master-Branch.html

If you take a look in the "Distro Support" section, you can find information about how to specify the hadoop version.

Thanks,

Gene

Charlie Chen

unread,

Sep 24, 2015, 3:35:46 PM9/24/15

to Tachyon Users

Hi, Gene:

I really appreciate your help and patience.

I will continue posting problems I encounter, so that other novices can refer to this post.

So after rebuilding Tachyon(I did not clean anything, just "mvn install -Dhadoop.version=2.7.1" ), then run Spark shell, I get error in master.log:

2015-09-24 12:34:05,440 ERROR server.TThreadPoolServer (TThreadPoolServer.java:run) - Thrift error occurred during processing of message.

org.apache.thrift.TException: Service name not found in message name: user_getUserId. Did you forget to use a TMultiplexProtocol in your client?

at org.apache.thrift.TMultiplexedProcessor.process(TMultiplexedProcessor.java:103)

at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

Thanks.

Charlie

在 2015年9月24日星期四 UTC-7下午12:06:19，Gene Pang写道：

Gene Pang

unread,

Sep 24, 2015, 3:43:08 PM9/24/15

to Tachyon Users

Hi Charlie,

If you want to continue to use Tachyon 0.8.0-SNAPSHOT, you will have to re-compile spark with the dependency changed to 0.8.0-SNAPSHOT in the pom file. (In spark's core/pom.xml)

However, the latest version of Spark already works with 0.7.1, so you could get the Tachyon 0.7.1, recompile for your hadoop version and use that.

Thanks,

Gene

Charlie Chen

unread,

Sep 24, 2015, 4:41:43 PM9/24/15

to Tachyon Users

Hi, Gene:

It works for me now.

Configuration:

Tachyon 0.7.1

Spark 1.5.0

Hadoop 2.7.1.

Thanks.

Charlie

在 2015年9月24日星期四 UTC-7下午12:43:08，Gene Pang写道：

Gene Pang

unread,

Oct 5, 2015, 2:32:19 PM10/5/15

to Tachyon Users

Thanks for the update!

-Gene

Reply all

Reply to author

Forward