Hi everyone,
first, thank you for all the great work, I really enjoy using luigi.
I have a very specific problem regarding the remote execution over ssh.
I build a fairly complex pipeline, that runs some of the heavy tasks on a remote cluster by submitting jobs to a sbatch system. All the necessary data files are handled by RemoteTargets and copied to the cluster in a separate Task prior to the heavy task.
The luigi scheduler and the workers are running on one machine that connects through SSH to the remote cluster and starts the job, using the RemoteContext.
The subsequent luigi Task waits for the remote job to finish by checking for the creation of the output files using RemoteTargets.
This was all working well by using SSHPASS, but with the latest update on the security guidelines of the compute cluster, two factor authentification is required for SSH connections.
To solve this, I added ssh multiplexing by establishing an external ControlMaster connection providing the 2FA manually, which should then used by luigi.
The problem is, when the pipeline is started and the dependency tree is built, many RemoteTargets are checked for existence resulting in a lot of SSH connections going through the ControlMaster connection.
Unfortunately, the maximum number of allowed concurrent connections is 10.
When luigi tries to add one additional connection, the ControlMaster connection breaks.
I tried defining the maximum number of connections as a resource and setting the resource in the tasks that have RemoteTargets.
As far as I understand, this does not work, because the luigi scheduler checks the whole dependency tree and not only when a task is executed.
I would be very glad if someone has an idea on how to deal with this problem.
Best,
Felix