Hi
> #1. In case pipeline B or C run by mistake before A is started, it should not invoke A because A is specified in the require() of B and C. We could not use ExternalTask because it seems like duplicating worker code for finding Task status.
Can you expand on why you can't use ExternalTask? This seems like the ideal solution for your problem.
> #2. If pipelineA is not complete, we want to make the pipeline B and C wait for some time and retry for some times. Since worker-max-reschedules could not be specified at Task level, it would be helpful if you can guide us to solve this.
From what I know, the luigi way of doing this is to simply have the pipeline scheduled every X minutes for a peroid after 10:00am. If the pipeline is finished already the reruns are very "cheap" (they only check for the output of the last task)
> #3. If pipelineA is rerun, we want to rerun its dependent pipelines B and C. It is like running dependent or downstream jobs from base job. Is there any utility like deps.py or RESTful service to find both (upstream and downstream) dependencies configured in different modules?
luigi isn't very good about reruns (as is). The solution I'm currently to use a script that you can input a task into. It removes that tasks output and all of the tasks "children", so that when scheduled next, luigi will rerun from that task and all tasks down the tree that were dependent on it. This is something that is important for me to be able to do to the point where every target we use MUST have a remove() method. This allows the us to easily automate reruns from certain stages.
In addition I'm toying with the idea of requiring that any task that is not idempotent to include a rollback() method or similar, essentially enabling the "transaction" to be reversed for a rerun.