Accessing Files from UnderFS - MapRFS

16 views
Skip to first unread message

John Omernik

unread,
Jul 22, 2014, 3:13:31 PM7/22/14
to tachyo...@googlegroups.com
Greetings all. 

I am using tachyon-0.5.0 with MapRFS (It's mostly working, just working through some bugs now)

I have a few quick questions:

1. I had to hack around the UnderFileSystem java to even get MapRFS to work. I understand that "we all" should be able to edit the code, but I feel uncomfortable with this (not a Java programmer), and just was curious if we could get a plugable way to indicate which filesystem prefixes are HDFS compatible. I.e. All I did was add a || startswith maprfs to the list of filesystems that use the HDFS methods. If we could just add this as variable at run time or something that would keep us from having to hack around to get things working. It would also allow for any other HDFS compatible Filesystems to just work. 

2.  I am trying to access files in maprfs:// using the tachyon:// string with the understanding that if the file is not IN tachyon, tachyon will reach out to the under file systems and grab the files.  I think there is an issue somehow in how the files are accessed, I am getting the errors below.  While my tachyon setup should be good (I am using maprfs:/// as my under filesystem address, it looks like tachyon is changing that at some point and trying to access maprfs:/path/to/file.txt instead of maprfs:///path/to/file.txt   Could that be causing the NPE that I am getting? I guess I am not enough of a programmer to trace this through the code to determine what the issue here may be.  Any help would be appreciated. 




export TACHYON_UNDERFS_HDFS_IMPL=com.mapr.fs.MapRFileSystem

export TACHYON_UNDERFS_ADDRESS=maprfs:///



mapr path:

maprfs:///path/to/file.txt


tachyon path:




Error:

14/07/22 13:31:48 INFO : tachyon://192.168.0.100:19998 tachyon://192.168.0.100:19998 maprfs:///

14/07/22 13:31:48 INFO : getFileStatus(/path/to/file.txt): HDFS Path: maprfs:/path/to/file.txt TPath: tachyon://192.168.0.100:19998/path/to/file.txt

14/07/22 13:31:48 WARN ClusterConf: Could not resolve CLDB hostname secure=false, for cluster: my.cluster.com

14/07/22 13:31:48 INFO : tachyon://hadoopmapr1/192.168.0.100:19998/path/to/file.txt maprfs:/path/to/file.txt 

java.lang.NullPointerException

at tachyon.util.UnderfsUtils.loadUnderFs(UnderfsUtils.java:113)

at tachyon.hadoop.TFS.fromHdfsToTachyon(TFS.java:187)

at tachyon.hadoop.TFS.getFileStatus(TFS.java:243)

at org.apache.hadoop.fs.FileSystem.getFileStatus(FileSystem.java:1419)

at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:1092)

at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1031)

at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:231)

at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:277)

at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:175)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)

...

John Omernik

unread,
Jul 22, 2014, 4:42:38 PM7/22/14
to tachyo...@googlegroups.com
I should point out that when I do hadoop fs -ls /path/to/file.txt      it works fine.  No issues. 

Henry Saputra

unread,
Jul 22, 2014, 5:24:23 PM7/22/14
to John Omernik, tachyo...@googlegroups.com
HI John,

The NPE stack is because the UnderFileSystem#parse could not return
the right pair or host and path from maprfs://

Seems like Tachyon may need to delegate to Hadoop FS code to check for
valid FS scheme.

Thanks,

Henry
> --
> You received this message because you are subscribed to the Google Groups
> "Tachyon Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tachyon-user...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

John Omernik

unread,
Jul 22, 2014, 5:44:31 PM7/22/14
to Henry Saputra, tachyo...@googlegroups.com
With maprfs there isn't a host, that's probably the issue. Is that something that's easily fixed in the code? 

John Omernik

unread,
Jul 22, 2014, 6:20:12 PM7/22/14
to Henry Saputra, tachyo...@googlegroups.com

So I was able to modify my configs to use maprfs://cldb:port as myunder filesystem.  That fixed the problem for me, but it may be worth looking into how it is handled going forward. 

Henry Saputra

unread,
Jul 22, 2014, 9:48:58 PM7/22/14
to John Omernik, tachyo...@googlegroups.com
I was thinking to fix it by giving the right error message. But, how
can maprfs URI does not need hostname in the path?

- Henry

John Omernik

unread,
Jul 23, 2014, 8:47:31 AM7/23/14
to Henry Saputra, tachyo...@googlegroups.com
So MapR, when installed on a node can be "configured" to have a default cluster.   That is how it's setup, and for many MapR users, with only one cluster, maprfs:/// is quicker and easier then maprfs://cldbhostname:cldbport/  Now, I can specify it if I  want (that's what I did there and it made it so Tachyon works), so either we can adjust the documentation to specifically require it. (Note the glusterfs:/// url is the same way, no host:port combo there either) Or we can handle it (I think that would be recommended).  

Whichever way we decide go, I do think we need to get out of the hardcoded "valid hdfs" prefixes in UnderFileSystem.java It's just not a good practice to hard code something that could change. We should be able to allow any prefix, but then I realized there is a difference between hdfs compatible prefixes and say file://  So perhaps we have a variable (ENV or conf variable) that specifies csv separated HDFS compatible prefixes? Or perhaps we just outline file:// as one type of prefix, and then "try" all others as hdfs:// compatible, and print good exceptions if they don't return properly? This could get ugly from a "unknown response" standpoint if we try to access foo:// and it returns something causes a segfault or other major event.  Perhaps whitelisting hdfs compatible entries would be the best? 

(Just thinking out loud)

Thanks! 
John


David Capwell

unread,
Jul 24, 2014, 9:46:33 AM7/24/14
to tachyo...@googlegroups.com
I can send a patch today to make the schemas a configs.

Also, hdsfs is the same way in ha mode (HDFS:///) so this type of URI needs to be supported.

Henry Saputra

unread,
Jul 24, 2014, 1:40:01 PM7/24/14
to tachyo...@googlegroups.com

David Capwell

unread,
Jul 24, 2014, 2:16:14 PM7/24/14
to tachyo...@googlegroups.com
Here is the pull-request to make hadoop scheme's configurable: https://github.com/amplab/tachyon/pull/297

David Capwell

unread,
Jul 25, 2014, 12:41:08 AM7/25/14
to tachyo...@googlegroups.com
Ok, so getting there.  We now let you build against MapR and schemes that go to hadoop ufs is now configurable.


Added a tachyon.underfs.hadoop.prefixes option that will let you setup maprfs.
To build against the different distros out there, take a look at https://github.com/amplab/tachyon/blob/master/docs/Building-Tachyon-Master-Branch.md#distro-support.

Still don't support hadoop HA or mapfs like your setup.
Reply all
Reply to author
Forward
0 new messages