I'm having a hard time trying to figure out the architecture of the luigi central-scheduler. I'm new to all this and it just isn't making sense to me.
My goal is to have a central scheduler that multiple clients can connect to and then run custom functions. The instructions are a little unclear on what is doing the work...
Lets say I have three tasks: A, B, and C. They need to be run in order: inputA.csv > TaskA > TaskB > TaskC > outputC.txt and that inputA,csv is a large file. I have the central-scheduler running inside a docker container on ServerX, while the input file and python codes for TaskA/B/C are on ServerY.
SO the question is, when I run something like this:
> luigi --scheduler-host serverX.com --scheduler-port 8082 --module my_module TaskA --input-file inputA.csv
do the large files get pushed to ServerX? Does all the different python files get pushed to ServerX? Does the scheduler just manage things and just manages all the work being done on ServerY? If the tasks were particularly compute intensive should I size Y to be bigger than X?
Any insight you could provide would be appreciated.