TestDFSIO with Tachyon

83 views
Skip to first unread message

Kerkinos Ioannis

unread,
Mar 17, 2015, 11:47:32 AM3/17/15
to tachyo...@googlegroups.com
Hi all,

Is there an easy way to run TestDFSIO with tachyon as the filesystem?
I tried changing the fs.defaultFS property in core site and giving it the value of tachyon://tachyon.master.hostname:19998/ which kind of worked but was throwing IOExceptions when I tried TestDFSIO or grep.
Any ideas?

Kind regards,
Ioannis

Calvin Jia

unread,
Mar 20, 2015, 6:10:31 PM3/20/15
to tachyo...@googlegroups.com
Hi Ioannis,

I haven't tried running TestDFSIO, but it should be possible. Could you provide the logs when you encountered errors?

Thanks,
Calvin

Pengfei Xuan

unread,
Mar 20, 2015, 11:29:10 PM3/20/15
to tachyo...@googlegroups.com
Tachyon supports MapReduce job, thus TestDFSIO should work for sure:

Actually, we have integrated TestDFSIO into Tachyon unit tests long time ago:


Best,
Pengfei

Kerkinos Ioannis

unread,
Mar 22, 2015, 9:09:09 AM3/22/15
to tachyo...@googlegroups.com
Hi guys, 

Thanks for replying!! Unless I am mistaken, which is possible, there is no way to say "No matter what, always use tachyon" right?
Wordcount and grep have input output arguments so it is easy to use tachyon with them but TestDFSIO does not and will go for the defaultFS defined in core-site.xml. 

Also the TestDFSIO which is used as a unit test didn't do what I wanted since it wouldn't use my already deployed Tachyon cluster right? (Again, could be mistaken)

So setting tachyon as the defaultFS and then trying to run TestDFSIO would produce the following.

15/03/22 13:33:58 DEBUG security.UserGroupInformation: PrivilegedActionException as:kerkinos (auth:SIMPLE) cause:org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: tachyon
15/03/22 13:33:58 INFO mapreduce.Cluster: Failed to use org.apache.hadoop.mapred.YarnClientProtocolProvider due to error: Error in instantiating YarnClient
15/03/22 13:33:58 DEBUG mapreduce.Cluster: Trying ClientProtocolProvider : org.apache.hadoop.mapred.LocalClientProtocolProvider
15/03/22 13:33:58 DEBUG mapreduce.Cluster: Cannot pick org.apache.hadoop.mapred.LocalClientProtocolProvider as the ClientProtocolProvider - returned null protocol
java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470)
at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:449)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:832)
at org.apache.hadoop.fs.TestDFSIO.runIOTest(TestDFSIO.java:443)
at org.apache.hadoop.fs.TestDFSIO.writeTest(TestDFSIO.java:425)
at org.apache.hadoop.fs.TestDFSIO.run(TestDFSIO.java:755)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.TestDFSIO.main(TestDFSIO.java:650)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.test.MapredTestDriver.run(MapredTestDriver.java:118)
at org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:126)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
15/03/22 13:33:58 DEBUG : Disconnecting from the master localhost/127.0.0.1:19998
15/03/22 13:33:58 DEBUG ipc.Client: stopping client from cache: org.apache.hadoop.ipc.Client@519d5d83

So my problem here was Yarn, since changing the following value to local would work with no errors.

<property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
</property>

So I tried looking a bit in the tachyon code and similar issues from other filesystems and found that adding the following property into core-site.xml should work.

<property>
  <name>fs.AbstractFileSystem.tachyon.impl</name>
  <value>tachyon.hadoop.AbstractTFS</value>
</property>

At that point I think I got the error that there was no constructor with the arguments (URI uri, Configuration conf) so I tried adding one.
Then the error changed to that the class was abstract. So I added the constructor to the TFS class and changed the value to that and got the error that it does not extend the AbstractFileSystem.
Which it does not, it extends the FileSystem and I should have checked earlier :)

So I implemented a class that extends the AbstractFileSystem and it worked. I think I am forgetting something else that I had an issue with and changed but I will edit my post if I remember it.

Tried this with hadoop 2.6.0 and tachyon 0.6.1

Not sure if there was an easier way to get it to work but let me know if there was something and I completely missed it!!

Calvin Jia

unread,
Mar 23, 2015, 3:23:54 PM3/23/15
to tachyo...@googlegroups.com
Hi Ioannis,

Thanks for the update! I think you the class you were looking for might be tachyon.hadoop.TFS?

On Tuesday, March 17, 2015 at 8:47:32 AM UTC-7, Kerkinos Ioannis wrote:

Kerkinos Ioannis

unread,
Mar 24, 2015, 8:18:48 AM3/24/15
to tachyo...@googlegroups.com
I just realized that it is not that clear in my previous message but I did try tachyon.hadoop.TFS but TFS extends AbstractTFS which extends FileSystem so I still got the org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: tachyon. 

So for a quick solution I implemented a tachyon.hadoop.AbstractFileSystemTFS class that extends the AbstractFileSystem and it worked.

I also had to change the getFileStatus method from AbstractTFS to return a specific user (me) because it returned null and when I tried TestDFSIO -read I would get:

java.io.IOException: The ownership on the staging directory /tmp/hadoop-yarn/staging/kerkinos/.staging is not as expected. It is owned by . The directory must be owned by the submitter kerkinos or by kerkinos.

Not a pretty solution but I really wanted to get TestDFIO to work :)
Reply all
Reply to author
Forward
0 new messages