How many workers do you run on one machine using django celery?

1,836 views
Skip to first unread message

sparky

unread,
Apr 20, 2013, 7:28:55 PM4/20/13
to django...@googlegroups.com
Quick question, just so I can compare, I would really like to hear other devs experience.

How many workers do you run on one machine using django celery?

I'm running 3 workers on an EC2 small instance. It takes around 5 seconds to complete 1 task running all 3 workers, does this sound right to you?

My issue is I could have 100,000 tasks very soon... scale wise I'm unsure what I'm going to need to do this. Bigger CPU, RAM and X workers, 5 seconds is far too long for me. All the task is doing is sending a SMS message HTTP thats it.

Michael Hernandez

unread,
Apr 20, 2013, 11:01:40 PM4/20/13
to django...@googlegroups.com
RabbitMQ with Celery is a distributed asynchronous code execution service. Keyword Distributed. The best thing to do in run several instances per machine of djcelery behind a supervisord. When you start encountering performance hits start by adding more services on the machine with supervisord. You will hit a limit at some point where you wont see any increase in performance from this. It is at this point that i suggest you scale down the amount of services on that machine and add another  machine to the rabbit stack with the same configuration. RabbitMQ should handle the load balancing itself. However it is at this point i believe that logging will become more difficult. So be sure to have a solution in place for handling that. 

You should also look into batching your processes among tasks, if you are going to be running so many tasks, than cant a task hold the last 10 sms messages to be sent out and send all 10 of them. This will improve scalability as well.

Shawn Milochik

unread,
Apr 20, 2013, 11:40:33 PM4/20/13
to django...@googlegroups.com
In addition to Michael's good comments:

I suspect you won't have 100,000 tasks coming in every second of every day. If you have to send out SMS messages and some of them take a few minutes to go out, that should be fine for most purposes. In addition, some SMS services have some limit per minute/hour for the number of messages you can send. If so, you'll be forced to queue them regardless and trickle out the sending.

Kurtis Mullins

unread,
Apr 21, 2013, 12:05:31 AM4/21/13
to django...@googlegroups.com
Both Michael and Shawn are spot-on in terms of scaling and using the queuing.

However, I'd like to add that 5 seconds to complete a single task like this seems way too slow to me. I don't have much experience with sending SMS but if you're using an external SMS API, it should be extremely quick. I imagine it's something along the lines of just a simple HTTP Request and checking the response to make sure the request was successfully accepted. Manually test this to see how long it takes from your own computer. If it is very quick, check for bottlenecks elsewhere. Without knowing the implementation details of your system there's not much more I could suggest checking into.


On Sat, Apr 20, 2013 at 11:40 PM, Shawn Milochik <sh...@milochik.com> wrote:
In addition to Michael's good comments:

I suspect you won't have 100,000 tasks coming in every second of every day. If you have to send out SMS messages and some of them take a few minutes to go out, that should be fine for most purposes. In addition, some SMS services have some limit per minute/hour for the number of messages you can send. If so, you'll be forced to queue them regardless and trickle out the sending.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

sparky

unread,
Apr 21, 2013, 5:42:07 AM4/21/13
to django...@googlegroups.com
Thanks for the responses it's very helpful.

You are right, I won't have 100,000 tasks every seconds it's just a huge batch I have to send which at the moment would be 100,000 tasks.

But just to be clear:

Loop each contact in my DB:
   TASK: SEND 1 SMS FOR CONTACT

I'm use Amazon SQS for the broker.

Each task currently takes around 5 seconds to complete, If I remove the task it's less that a second per SMS. meaning it would take 100+ hours to send all my messages, far, far to long!




sparky

unread,
Apr 21, 2013, 5:47:40 AM4/21/13
to django...@googlegroups.com
One last thing to add, the task it's self does not seems to be the issue, 'got message from broker'  is the 3-4 second wait I can see.

sparky

unread,
Apr 21, 2013, 5:51:57 AM4/21/13
to django...@googlegroups.com

Michael Hernandez

unread,
Apr 21, 2013, 9:39:46 AM4/21/13
to django...@googlegroups.com

I have been working with heavy reporting and analytics lately. I would suggest use thread queuing with many farming threads. I use python thread but Greenlets it some else should be better

On Apr 21, 2013 5:48 AM, "sparky" <cfsp...@gmail.com> wrote:
One last thing to add, the task it's self does not seems to be the issue, 'got message from broker'  is the 3-4 second wait I can see.

--

Michael Hernandez

unread,
Apr 21, 2013, 9:40:29 AM4/21/13
to django...@googlegroups.com

Sorry darn phone changed daemon to farming

Michael Hernandez

unread,
Apr 21, 2013, 10:24:37 AM4/21/13
to django...@googlegroups.com
you should try to not use Task Dispatching. Rather if you are doing this once a day, have a periodic task run every 24 hours. In the same module as the thread try doing something like what is at the bottom of this page http://docs.python.org/2/library/queue.html . It is a queue thread daemon architecture. You will process a 100k in no time at all. Make sure to have proper exception handling and recovery however. And as always just use the docs to learn from but know there are better ways to go about it and improve the threading model.

On Sunday, April 21, 2013 9:40:29 AM UTC-4, Michael Hernandez wrote:

Sorry darn phone changed daemon to farmingn

On Apr 21, 2013 9:39 AM, "Michael Hernandez" <michael.hernandez1988@gmail.com> wrote:

I have been working with heavy reporting and analytics lately. I would suggest use thread queuing with many farming threads. I use python thread but Greenlets it some else should be better

On Apr 21, 2013 5:48 AM, "sparky" <cfsp...@gmail.com> wrote:
One last thing to add, the task it's self does not seems to be the issue, 'got message from broker'  is the 3-4 second wait I can see.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.

Scott Anderson

unread,
Apr 21, 2013, 11:07:23 AM4/21/13
to django...@googlegroups.com
You can't test a system like this by sending one message: you're just testing the latency, not throughput. Latency is the end-to-end time it takes for a single message to make its way through the system. Throughput is the number of total messages per second that can make their way through. As long as your tasks are not sensitive to delays (SMS messages are not, generally), a queueing system can help greatly increase the overall throughput.

Queueing systems are for spreading work around so they can be completed *in aggregate* more quickly and reliably. They're not for reducing the latency of a single message.

SQS in particular is architected for massive scale and reliability. To achieve this the latency for a single message is very high, but it can handle millions and millions of messages per second overall.

If you test with a single thread feeding and a single thread reading (as in the amazon-sqs-vs-rabbitmq blog) you're strictly testing queue latency, not throughput.

Time taken to process all of the messages will look something like this, where:

Nm= number of messages
Ts = SQS latency, or 3 to 4s from your tests
Te = time to process a message for enqueuing
Td = time to process a dequeued task
Ne = number of enqueue workers
Nd = number of dequeue workers

As long as Ne * Te <= Nd * Td (ie. the enqueue workers can keep up with the dequeuing workers), the total time to process Nm messages will look like this:

Te + Ts + (ceil(Nm / Nd) * Td)

Or: 
<enqueue processing for one message><SQS><Nd tasks being processed in parallel>

You can starve a queueing system on the front as well as the back (which is what that blog post does).

So here's a more appropriate test:
Nm = 100,000 messages
Ts = 4s
Te = 20ms, time to ready a message to send
Td = 200ms, time for the task to process a message
Ne = 1 thread putting messages on the queue
Nd = 10 threads pulling messages from the queue

You'll probably find that the entire thing will take this much time:

20ms + 4s + (ceil(100,000 / 10) * 200ms), or just over 2004s.

Up the enqueue threads to 10, and dequeue workers to 100:

20ms + 4s + (ceil(100,000 / 100) * 200ms), or just over 204s.

Note that the SQS latency is a constant, however.

In other words, it will take 3-4 seconds to get a message through the queue, and then whatever your task execution time is, all for any individual message. But you'll be processing 10 at a time through this pipeline. Increase the number of enqueuers and dequeuers and your throughput will scale linearly, assuming you spread the workers amongst enough EC2 instances to handle the load of the tasks themselves. You're trading end-to-end latency for higher throughput.

If you only send 1 message though, it looks like this with 1, 10, and 100 dequeue workers:

20ms + 4s + (ceil(1 / 1) * 200ms) == 4020ms + (1 * 200ms) == 4.22s
20ms + 4s + (ceil(1 / 10) * 200ms) == 4020ms + (1 * 200ms) == 4.22s
20ms + 4s + (ceil(1 / 100) * 200ms) == 4020ms + (1 * 200ms) == 4.22s

So, at a single message you're testing latency only, not throughput.

For the visual folk out there, in this amazingly well-rendered ASCII representation of a parallel communication system each line is a message, the distance between Start and End is the latency, and the height of the stack is throughput, and the distance from the first start to the last end is the amount of time it takes to process all of the messages.

What you tested:

(Start ========== End)
<-------- 4s -------->

What you would test with 5 workers enqueuing and dequeuing in parallel:

(Start ========== End)
 (Start ========== End)
  (Start ========== End)
   (Start ========== End)
    (Start ========== End)
     (Start ========== End)
<--------- 4s + N -------->

Where N is based on the parallel execution time of individual tasks by the dequeue workers.

A single RabbitMQ system will have much lower latency but won't be able to handle the high aggregate throughput of SQS, and at higher message rates will fall behind.

(Start = End)(Start = End)(Start = End)
 (Start = End)(Start = End)(Start = End)
<-------------------------------------->

Obviously this is neither to scale nor truly representative, but hopefully it helps to illustrate the point. The takeaway is that the more dequeue workers you have, the more overall throughput a system like SQS can give you (modulus EC2 time for RabbitMQ vs. SQS costs, which is a completely different discussion).

That said if you feel like maintaining your own RabbitMQ cluster with the maintenance that it would entail, for lower message throughputs RabbitMQ may be cheaper for the same throughput.

Regards,
-scott

sparky

unread,
Apr 21, 2013, 11:07:58 AM4/21/13
to django...@googlegroups.com
wow, some good advice here thanks. I tested with RabbitMQ and its fast, all I can say is it seems to be SQS. My advice don't use it!

Venkatraman S

unread,
May 6, 2013, 6:00:40 AM5/6/13
to django...@googlegroups.com
Do also check out uWSGI spooler - i haven't tested with heavy loads yet, but looks to do the work for a reasonably small site.
On Sun, Apr 21, 2013 at 8:37 PM, sparky <cfsp...@gmail.com> wrote:
wow, some good advice here thanks. I tested with RabbitMQ and its fast, all I can say is it seems to be SQS. My advice don't use it!

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages