CDAP - ' Files' sink plug in

116 views
Skip to first unread message

Leona Chen

unread,
Sep 23, 2021, 1:18:56 PM9/23/21
to CDAP User
Hi,

i want to build a pipeline where i use Wrangler plugin to transform a csv file, then i want to output this result to a csv file using File. However, the output i got looks like a two split file with one named 'SUCCESS_' , and the other named 'part-r-0000'. This seems like the file structure that errorcollector plugin i used where errors records get output into 'part-t-...' file.

but i want the whole file which was processed through the wrangler to be ouptut to a whole csv. what did i do wrong?

thanks
Leona

Baptiste Benet

unread,
Sep 24, 2021, 3:38:40 AM9/24/21
to cdap...@googlegroups.com
Hey Leona,

Could you send a screenshot of your pipeline, please? Specifically how you configure the File sink plugin. 
It seems to me that the parts are just there because it's the natural output format from Spark's parallel processing. If you look at the data inside the 'part-r-0000' file, is it CSV data?

At LiveRamp (the company I work at), we created and use a GCS merge file plugin, that lets us combine all of those part files into one file. If there is some interest in it, we could make sure we open-source it.

Best,
Baptiste Benet

--
You received this message because you are subscribed to the Google Groups "CDAP User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cdap-user/9668e660-70f2-410d-a346-daca13d3ef57n%40googlegroups.com.

Leona Chen

unread,
Sep 24, 2021, 6:12:31 AM9/24/21
to CDAP User
Hi Baptise,

here are some screenshots, i meant to attach them to my original post but forgot!!

the 'part..' file seems to have the data rows where i filtered out using wrangler condition, and success file has no data at all.  I am looking for a simiple post-wrangler-transformed file using this plugin. 

once again, thank you so much for helping me again. 
cdap q.pdf

Leona Chen

unread,
Sep 24, 2021, 6:59:04 AM9/24/21
to CDAP User
Hi Baptise,

actually, i probably only have one question on using this plugin now - how do i use the properties set up page to define the output file name? as i want to give a specific name to the file. 
Reply all
Reply to author
Forward
0 new messages