As far as I know not yet, it would need to be possible to define networking between containers like Docker does -->
https://docs.docker.com/engine/userguide/networking/ and then have an orchestrator to map between host and other containers. What you could try is starting container instances that use a (shared) network (localhost) and maybe have them operate at different ports?
Just curious - what is your strategy for the layer between the web application and the GPU servers? These kind of connections (to a shared resource with sensitive data and what not) historically can make some nervous :P
I think what you are saying is:
1. user requests task on front end
2. front end handles authentication and authorization
3. front end submits job to celery
4. celery task is a Singularity container running on GPU, and since worker == container, it requires networking
and what I'm suggesting is to separate the front / back servers a bit more:
1. user requests task on front end
2. front end handles authentication and authorization
3. front end submits job to celery (still frontend)
4. celery worker(still frontend) is a process that finishes configuration of job (memory? time? cluster? queue type?) and sends job to a back end queue (slurm, sge)
5. the slurm/sge queues are optimized for this sort of thing, so you just load the singularity module, run the job, and send a signal back to the front end queue.
So a few notes on the above:
- there is flexibility (step 4) to have different kinds of queues in different locations. The front end worker is the manager of where things go.
- the singularity container is not required to have a network, or be constantly running.
- the main server (the same celery workers) also need a means to receive messages of something finishing to update the server, this could be a POST or similar.
- ideally you could manage slurm or sge with an API.
I think there are two general recipes you can follow here. If you go the "old school HPC route" then you would do something like the above. You don't have a container cluster, you have standard job managers and tools that communicate to them. If you go the "container cluster" route then what would be needed is to have orchestration defined for Singularity, and even better integration with Kubernetes. Several have talked about this but I don't see any code yet :) It's a bit of a catch 22, because you could just use Docker for the second, but wait you can't because Docker + HPC = death. You could just use HPC technology for the first but wait you can't, because that's not going to plug in easily in environments outside of HPC. So your choices are to help make Singularity more enterprise/cloudy to integrate into container clusters, or to imagine if a more HPC-centric method could be adopted in different environments. I think generally connecting protected clusters to the outside world is a hard problem, take a look at Agave -->
https://github.com/agaveapi for inspiration.
Best,
Vanessa