[Scalding] Using MultipleDelimtedFiles

27 views
Skip to first unread message

Jeremy Calbreath

unread,
Apr 16, 2015, 12:05:38 PM4/16/15
to cascadi...@googlegroups.com
I'm trying to read in multiple delimited files into a single pipe using the MultipleDelimitedFiles case clas from FileSource.scala, however, I am getting an error that "Data is missing from one or more paths".

Here is the code I am using:

val schema = List('var1, 'var2, 'var3)
val files = "/path1/part-00000, /path2/part-00000"

MultipleDelimitedFiles(schema, separator = "\11", quote = "", skipHeader = true, writeHeader = true, files)
.addTrap(Tsv("~/errors.txt"))
.groupAll.write(Tsv(args("output"), writeHeader = true))

Both files noted above exist in hdfs, both have data, both have same schema.  They actually are the exact same file just copied so I can play with this a little bit.  Am I using the MultipleDelimitedFiles correctly?  Any ideas on the source of the error?

I did this successfully using MultipleTSVFiles, but moving forward I need to read mutliple files with other delimiters.  If there is a better way, please let me know.

Thanks.
Reply all
Reply to author
Forward
0 new messages