GCS sink node has no option of Single File output

7 views
Skip to first unread message

Apurav Mahajan (xWF)

unread,
Jul 16, 2024, 5:14:22 AM (yesterday) Jul 16
to CDAP Developer
Hi team,

We have been working on some task where our requirement is to use a csv file placed in GCS bucket, apply some transformations, and store the resultant csv file back to the GCS bucket. 
The size of the input file is also quite small, like around a maximum of 20 Mb. 
We're using the GCS sink plugin after applying transformations using Wrangler to store the output csv file in GCS bucket, but it is resulting in multiple part files. 
We need only 1 single file and there seems to be no direct way to achieve this. Also, the customization of having a filename as we wish, without the suffix of -r-00000 is not available in the plugin. 

I understand that this works on MapReduce/ Spark framework and hence the part files, but I believe this basic option must be there for users to be able to get a merged single file as output somehow.
Is there any way to achieve this?

Thanks!
Reply all
Reply to author
Forward
0 new messages