Use the Hfs class if the 'kind' of resource is unknown at design time. To use, prefix a scheme to the 'stringPath'. Wherehdfs://...
will denote Dfs, andfile://...
will denote Lfs.
By default Cascading on Hadoop will assume any source or sink Tap using thefile://
URI scheme intends to read files from the local client filesystem (for example when using theLfs
Tap) where the Hadoop job jar is started, Tap so will force any MapReduce jobs reading or writing tofile://
resources to run in Hadoop "standalone mode" so that the file can be read.
Reading the local files , process them and write them to HDFS
Note that using a LfsTap
instance in aFlow
will force a portion of not the whole Flow to be executed in "local" mode forcing the Flow to execute in the current JVM. Mixing withDfs
and other Tap types is possible, providing a means to implement complex file/data management functions.