I'm trying to read in multiple delimited files into a single pipe using the MultipleDelimitedFiles case clas from FileSource.scala, however, I am getting an error that "Data is missing from one or more paths".
Here is the code I am using:
val schema = List('var1, 'var2, 'var3)
val files = "/path1/part-00000, /path2/part-00000"
MultipleDelimitedFiles(schema, separator = "\11", quote = "", skipHeader = true, writeHeader = true, files)
.addTrap(Tsv("~/errors.txt"))
.groupAll.write(Tsv(args("output"), writeHeader = true))
Both files noted above exist in hdfs, both have data, both have same schema. They actually are the exact same file just copied so I can play with this a little bit. Am I using the MultipleDelimitedFiles correctly? Any ideas on the source of the error?
I did this successfully using MultipleTSVFiles, but moving forward I need to read mutliple files with other delimiters. If there is a better way, please let me know.
Thanks.