how priority is handled

1,256 views
Skip to first unread message

python-django-user

unread,
Nov 3, 2009, 10:49:42 AM11/3/09
to celery-users
I see in the online module reference that celery.task.base.Task uses
the "priotiy".

Does priotity 0 means highest priority? Or is it 9?

If I only have one worker process, and a large number of tasks in the
queue, with various different priorities, including multiple ones with
the same priotities.

Does that mean all the tasks with the highest priority will always be
processed first, before any other with a lower priority will ever get
processed? Or is celery even distributing processing time according to
the priorities, but ensuring lower priority tasks still get processed,
but at a slower rate than higher priority tasks?

Thanks.

Ask Solem

unread,
Nov 3, 2009, 11:06:43 AM11/3/09
to celery...@googlegroups.com

Priority is an AMQP feature where 0 is the highest priority.
Sadly, priorities has not been implemented in RabbitMQ yet.
See discussion here:
http://www.trapexit.org/forum/viewtopic.php?p=48220

They say it might be better to create different queues e.g. one high
priority queue
and one low priority queue, then let e.g 2 machines take the high
priority queue, and 1 machine
the lower priority queue. This is what we do at Opera.

Also there is a rate limiting feature coming in 1.0 (unittests passes
already, and the current code can be found
in the tokenbucket branch: http://github.com/ask/celery/tree/tokenbucket/
)
With rate limits you can rate limit the lower priority tasks, thus
getting a better throughput of important tasks.

python-django-user

unread,
Nov 4, 2009, 9:01:38 AM11/4/09
to celery-users
Thanks for the response.

If I only have one machine, is there any way to configure celery to
assign certain workers to perform only specific tasks or process
specific queues?


Could you briefly explain how rate liiting works, or point me to some
documentation? Will rate limiting work if I only have one worker? My
impression is that each worker takes one task from the queue, and
stays with it until it is completed, before taking the next one from
the queue.
> getting a better throughput of important tasks.- Hide quoted text -
>
> - Show quoted text -

Ask Solem

unread,
Nov 4, 2009, 11:51:51 AM11/4/09
to celery...@googlegroups.com

On Nov 4, 2009, at 3:01 PM, python-django-user wrote:

>
> Thanks for the response.
>
> If I only have one machine, is there any way to configure celery to
> assign certain workers to perform only specific tasks or process
> specific queues?
>
>
> Could you briefly explain how rate liiting works, or point me to some
> documentation?

Only documentation so far is the code, as it's an experimental feature.

You set a rate limit for a task type with the Task.rate_limit attribute.

class MyTask(Task):
rate_limit = "10/s"

will have a rate limit of 10 tasks a second, while taking bursts of
activity in account,
so it doesn't mean it takes 1 second to process 10 tasks.

> Will rate limiting work if I only have one worker?

Indeed!

> My
> impression is that each worker takes one task from the queue, and
> stays with it until it is completed, before taking the next one from
> the queue.

That's not right. This would normally be the case with a polling
architecture. With AMQP QoS it tries to prefetch as many messages as
there are
pool workers (the concurrency setting). Think about the countdown/ETA
options, which let you specify a date and time in the future when the
task should be executed, these ETA messages will stay in the workers
memory until the ETA is met. Due to AMQP's acknowledgement workflow it
doesn't matter if the worker crashes with these in memory, the broker
will resend the messages in that case.

There are two internal queues in celery, by internal I mean it's a
python Queue.Queue object.

* bucket_queue

This is the rate limited Queue, tasks that should be
executed immediately are put on this queue.

* hold_queue

Tasks that are on paused (because it has an ETA) is put
into this queue.


There are three threads:


* Mediator

Moves tasks in the hold_queue to the worker pool for
execution.


* PeriodicTaskController

Wakes up every second to schedule periodic tasks and moves
tasks that are ready from the hold_queue
onto the bucket_queue.


* AMQPListener

Consumes messages from AMQP and puts them into the
hold_queue.


Maybe TMI (good to have it in writing), but as you can see there can
be lots of messages flowing through
these components, and it's not scary all thanks to AMQP's reliable
messaging.


python-django-user

unread,
Nov 5, 2009, 9:07:25 AM11/5/09
to celery-users
Thanks you for your explanation about how rate limiting works.

The problem we are trying to solve is this: we have two (or more)
types of tasks: urgent and non-urgent. We just want urgent tasks to be
processed at a higher frequency than non-urgent tasks, WHILE
processing all tasks as quickly as possible.

I can see that using rate limiting, we can set a higher rate say 100/s
for urgent tasks, and a lower rate 10/s for non-urgent tasks.

However, what happens when currently there are no urgent tasks in the
queue? Would the non-urgent tasks still get processed at a limit 10/s,
even though the available resources might allow a rate of 100/s?

Thanks.

Ask Solem

unread,
Nov 5, 2009, 12:44:34 PM11/5/09
to celery-users


On Nov 5, 3:07 pm, python-django-user <python.django.u...@gmail.com>
wrote:
> Thanks you for your explanation about how rate limiting works.
>
> The problem we are trying to solve is this: we have two (or more)
> types of tasks: urgent and non-urgent. We just want urgent tasks to be
> processed at a higher frequency than non-urgent tasks, WHILE
> processing all tasks as quickly as possible.
>
> I can see that using rate limiting, we can set a higher rate say 100/s
> for urgent tasks, and a lower rate 10/s for non-urgent tasks.
>

> However, what happens when currently there are no urgent tasks in the
> queue? Would the non-urgent tasks still get processed at a limit 10/s,
> even though the available resources might allow a rate of 100/s?
>

Good question, the limits are individual for the tasks, so it doesn't
know
anything about the state of the system as a whole. Each task type has
it's own internal
queue with its own rate limit. So if you have 4 types of tasks (4 Task
classes) you have 4 Queues.
It will go through these queues until it finds a queue that has an
item available. However it also
make sure that no queue gets too greedy (i.e it's very active), and
that all the queues have an opportunity
to have their items processed. This system is complicated, I'm not
sure if I'm able to predict the outcome for what you're trying to do
with it.

What i'd do is to have no rate limit for the high priority tasks, and
only rate limit the low priority tasks.
I'm pretty sure that would have the effect you're after.
Reply all
Reply to author
Forward
0 new messages