Do you know use cases where it is better to use the aggregation pipeline from Spark than only Spark?
Hi,
As you have mentioned, using MongoDB aggregation pipeline could minimise the amount of data loaded into your Spark workers. This is especially useful if you have large documents but only need to process certain fields reducing the amount of data transfer over the network.
Apache Spark is a data processing framework, it is designed to process data. If you already have a Spark deployment, it would make sense to use it for data processing.
Kind regards,
Wan.