Celery RabbitMQ worker packaged in a Singularity container

557 views
Skip to first unread message

Dejan Štepec

unread,
Nov 25, 2017, 8:10:04 PM11/25/17
to singularity
Hi all!

As a new user of a Singularity I have a question about a use case. I intend to package an image processing app in a Singularity container but have a question about communication between separate nodes for example using RabitMQ or to access to some remote queue. I intend to package these worker nodes as a Celery workers so I need to know if we can connect to a remote RabbitMQ queue from the Singularity container to get a new batch of data to be processed from a remote queue. I intend to use Singularity as this nodes are computationally intensive and will be placed at some HPC with SLURM workload manager installed of course having no sudo access.

Best regards,

Dejan

v

unread,
Nov 25, 2017, 9:06:40 PM11/25/17
to singu...@lbl.gov
This is a cool use case!

So I'm guessing the image processing app is probably in python? And you have python tasks defined for it? The broker is a remote queue (RabbitMQ) that would utilize the workers.

Is there a reason to have multiple instances for many workers? I've tried both, and usually the solution I prefer is to start a single container with concurrency (meaning multiple workers). By default the call to start the worker with celery will do the number of cores available, but you can also set it to be something different (eg celery -A myproject worker --concurrency=10...). For Singularity you would want to have your image, and then have a startscript for it, and have the startscript be this celery command to start up the (concurrent) workers.

I would start out doing the following:

 - create a singularity image that has a startscript with your command to start the workers
 - start an instance
 - set up your RabbitMQ and test sending jobs to it.
 - report back!

Best,

Vanessa

--
You received this message because you are subscribed to the Google Groups "singularity" group.
To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.



--
Vanessa Villamia Sochat
Stanford University '16

Dejan Štepec

unread,
Nov 26, 2017, 3:50:43 AM11/26/17
to singu...@lbl.gov
I intented to use multiple instances of separete nodes because of heavy use of GPUs. I guess that for such usage there would be better to make workers completely independent as separate instances of containers. The only thing that I wasn't sure of is the networking part in Singularity, meaning if we can connect to a remote queue to get a job.

26. nov. 2017 3:07 dop. je oseba "v" <vso...@gmail.com> napisala:
To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.



--
Vanessa Villamia Sochat
Stanford University '16

John Hearns

unread,
Nov 26, 2017, 5:11:49 AM11/26/17
to singu...@lbl.gov
The classic way to do this sort of this is with a  batch scheduler. I have set up an render farm using Slurm.

Interesting thoiugh that you are using Celery for this task I was discussing Celery over on the Julia language discussion boards,
as a method for dispatching thousands of tasks per day. It probably is a very good fit for tasks like this.
I though rabbits are carrots though - maybe celery is good for them also.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.

v

unread,
Nov 26, 2017, 10:17:28 AM11/26/17
to singu...@lbl.gov
If management of resources (nodes) is in order, I think a proper job manager might be better fit for this job. Celery (from how I've used it) is more appropriate for a situation like a web application where you want the server to have a job queue, and put tasks in it to run when there is free compute time (and to run async). They are similar because a "master" node maps to the broker, and the "slaves" map to the workers. Also in my experience the workers are on and ready, which would translate to having multiple container instances running and listening for tasks. Is there a reason you have preference for celery instead of traditional SLURM / SGE / other? I think if you really wanted to do this fully it would be most logical to add an actual supporting plugin for singularity workers to rabbitmq:


Or something similar with celery. Keep us in the loop!

To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.
To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

Dejan Štepec

unread,
Nov 26, 2017, 10:26:17 AM11/26/17
to singu...@lbl.gov
The whole thing will be part of a bigger web service so this additional image processing service will get requests from some backend infrastructure and this service besides compute nodes will consist also from a server (master) that will accept remote requests (e.g. web sockets) and will be pushing work to a queue to Celery workers. Probably Celery is not absolutely needed but we want the service to be portable not used only on HPCs but some other cluster using e.g. Kubernetes and Docker etc.

26. nov. 2017 4:17 pop. je oseba "v" <vso...@gmail.com> napisala:
To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

v

unread,
Nov 26, 2017, 10:32:21 AM11/26/17
to singu...@lbl.gov
oh that is super cool! If the celery workers are part of the web service, this is a reasonable approach I think. I would use a standard (front server) set up with some application (e.g., django or flask driven if you are using Python) controlling / set up with a queue, and instead of having the job queue living with the frontend server part, I would have one async task that has the job to assess metadata (eg, is this a user of cluster X? how much memory and time does it need?) to send it to your infrastructure queue (a different queue!) to first pass through authentication, authorization, etc. Actually, you could probably also do this with an internal API. I've never tried, but it seems like it would be challenging to require the celery connected to the web application to actually control the nodes. If you get in some situation of wanting to add another set of (kind of different or external nodes) it would be hard to do. If each is sort of modular (meaning the celery workers have an authenticated general message they can send anywhere, to multiple kinds of nodes) that seems like a good approach.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

Dejan Štepec

unread,
Nov 26, 2017, 11:48:42 AM11/26/17
to singu...@lbl.gov
Frontend server probably using Flask yes will authenticate the user using calls to existing backend so this part is taken care of. After sucessfull authentication I would like to send task to the task queue (celery) and one of the predefinied and started gpu workers that would run on separate nodes using Singularity would take the job, execute it and send the results back to the frontend server which would deliver result back to the backend. So my only concern was if networking in Singularity supports this kind of idea ( running container as a Celery worker - tcp connection to some remote queue, that maybe will also run as a Singularity container). Basically is this possible?

26. nov. 2017 4:32 pop. je oseba "v" <vso...@gmail.com> napisala:
To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

v

unread,
Nov 26, 2017, 12:26:05 PM11/26/17
to singu...@lbl.gov
As far as I know not yet, it would need to be possible to define networking between containers like Docker does --> https://docs.docker.com/engine/userguide/networking/ and then have an orchestrator to map between host and other containers. What you could try is starting container instances that use a (shared) network (localhost) and maybe have them operate at different ports?

Just curious - what is your strategy for the layer between the web application and the GPU servers? These kind of connections (to a shared resource with sensitive data and what not) historically can make some nervous :P

I think what you are saying is:

 1. user requests task on front end
 2. front end handles authentication and authorization
 3. front end submits job to celery
 4. celery task is a Singularity container running on GPU, and since worker == container, it requires networking

and what I'm suggesting is to separate the front / back servers a bit more:

 1. user requests task on front end
 2. front end handles authentication and authorization
 3. front end submits job to celery (still frontend)
 4. celery worker(still frontend) is a process that finishes configuration of job (memory? time? cluster? queue type?) and sends job to a back end queue (slurm, sge)
 5. the slurm/sge queues are optimized for this sort of thing, so you just load the singularity module, run the job, and send a signal back to the front end queue.

So a few notes on the above:
 - there is flexibility (step 4) to have different kinds of queues in different locations. The front end worker is the manager of where things go.
 - the singularity container is not required to have a network, or be constantly running.
 - the main server (the same celery workers) also need a means to receive messages of something finishing to update the server, this could be a POST or similar.
 - ideally you could manage slurm or sge with an API.

I think there are two general recipes you can follow here. If you go the "old school HPC route" then you would do something like the above. You don't have a container cluster, you have standard job managers and tools that communicate to them.  If you go the "container cluster" route then what would be needed is to have orchestration defined for Singularity, and even better integration with Kubernetes. Several have talked about this but I don't see any code yet :) It's a bit of a catch 22, because you could just use Docker for the second, but wait you can't because Docker + HPC = death. You could just use HPC technology for the first but wait you can't, because that's not going to plug in easily in environments outside of HPC. So your choices are to help make Singularity more enterprise/cloudy to integrate into container clusters, or to imagine if a more HPC-centric method could be adopted in different environments. I think generally connecting protected clusters to the outside world is a hard problem, take a look at Agave --> https://github.com/agaveapi for inspiration.

Best,

Vanessa




To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

Dejan Štepec

unread,
Nov 26, 2017, 2:16:41 PM11/26/17
to singu...@lbl.gov
Thanks for your inputs. So there is no way to get "internet access" inside Singularity container to access remote Queue which makes it quite unusable for our task. We don't have time/resources to develop the Singularity :(.
Another problem is that the nodes would have to be running constantly and listening for new jobs in a queue. Continuously restarting the node for each request would be inefficient as we are using deep learning models that are heavy by their design and loading the architecture again and again on the GPU would be time consuming. Docker + Kubernetes is ideal but we only have access to multiple GPUs on some HPC cluster currently.

Best,

Dejan

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.



--

v

unread,
Nov 26, 2017, 3:09:59 PM11/26/17
to singu...@lbl.gov
On Sun, Nov 26, 2017 at 11:16 AM, Dejan Štepec <stepec....@gmail.com> wrote:
Thanks for your inputs. So there is no way to get "internet access" inside Singularity container to access remote Queue which makes it quite unusable for our task.

No you definitely can! The container itself is pretty seamless to the host. I think what would make it unusable is requiring a (separate) container only network space, but I can imagine ways to do it not needing that.
 
We don't have time/resources to develop the Singularity :(.

Haha, tell me about it :)  This is how open source works.

Another problem is that the nodes would have to be running constantly and listening for new jobs in a queue. Continuously restarting the node for each request would be inefficient as we are using deep learning models that are heavy by their design and loading the architecture again and again on the GPU would be time consuming. Docker + Kubernetes is ideal but we only have access to multiple GPUs on some HPC cluster currently.

I can think of ways how I'd do it, but I can't give you a full instruction manual / tutorial, which sounds like it's what you are looking for. I think probably this will eventually exist but it's not done yet.

Best of luck! Please share anyway if you do something cool, I (and other users on the list) are highly interested.
 

Best,

Dejan

To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.



--

--
You received this message because you are subscribed to the Google Groups "singularity" group.
To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.



--
Reply all
Reply to author
Forward
0 new messages