Hi,
When we run like 20 pipelines in parallel, and each one needs many resources, our YARN cluster cannot provide enough CPU/memory resources for all the pipelines. Then we find some pipelines will directly turn from “Starting” status to “Stopped”. What’s worse, some pipelines may firstly turn from “Starting” to “Running”, and then turn to “Failed” due to resources are not enough for its intermediate steps.
Is there a way to solve this? We were expecting that CDAP will have some mechanisms to manage requests in a queue if resources are not enough or retry if failed.
Thank you.