Scalding Input Filepaths

60 views
Skip to first unread message

cvk

unread,
Apr 18, 2014, 6:59:48 PM4/18/14
to cascadi...@googlegroups.com
Hi,

A really basic question from a newbie. I have a huge hierarchy of folders in HDFS. I want to process all the files below a certain level. Can someone advise me as to how should I provide the input?
I don't want to use MultipleWritableSequenceFiles and give  comma separated names for all of those hundreds of directories.

Thanks in advance. 

Regards,
cvk


cvk

unread,
Apr 18, 2014, 7:20:58 PM4/18/14
to cascadi...@googlegroups.com
Update: It does work when I use the * wildcard character. However it fails upon encountering _logs files. Any workarounds?

Miguel Ping

unread,
Apr 21, 2014, 10:04:16 AM4/21/14
to cascadi...@googlegroups.com
I had a similar problem, try to use wildcard/globbing to match the layout. If your files have an extension that's easier than matching the exact structure.
Just make sure you put the path inside double quotes

cvk

unread,
Apr 29, 2014, 3:28:55 PM4/29/14
to cascadi...@googlegroups.com
For others' reference, yes Wildcard regexes in the paths did the trick. Thanks Miguel
Reply all
Reply to author
Forward
0 new messages