Hi TD,
Thank you so much for your kind response. Can I please draw your kind attention to another detail?
We do end up getting around 100GB of compressed parquet data into each partition and compressing them takes a larger cluster size than we need to have for a continuously running streaming job. So my streaming job runs in c.xlarge cluster and my compression in c5d.4xlarge.
While I can surely run the compression job in the same cluster as streaming, trying to do that may not end up being cost optimal.
But let me give your suggestion a try and see how things go.
Regards,
Gourav Sengupta