Future way submits all jobs at the same time to Spark's scheduler, and let Spark's scheduler handle the scheduling of all those jobs. Parallel collection way is essentially similar to running them in a single threaded program, but now having N threads doing the submission (each will submit a job, block until the job is done, then submit the next job).
In terms of trade-offs:
- Utilization wise, future probably has higher cluster utilization in some cases. But I suspect you won't see a big difference in most cases.
- You get finer control over job progress using Futures. For example, you can work with partial results.
- If you have too many jobs (e.g. 10000), submitting all of them to Spark might overload the Spark scheduler. I have never tested running 10000 jobs concurrently.