Handling Celery Connection Lost Problem

1,950 views
Skip to first unread message

Mukul Mantosh

unread,
Jan 1, 2018, 10:46:33 PM1/1/18
to Django users
How to handle the problem when celery is unable to connect to the broker (redis or rabbitmq)...while the connection gets lost the entire code gets stuck because celery is trying to reconnect to the host.

But without being stuck how can i simply bypass this problem and my code be still running smoothly.


Jason

unread,
Jan 2, 2018, 7:54:22 AM1/2/18
to Django users
This is probably something better suited for the celery github issue tracker or https://groups.google.com/forum/#!forum/celery-users

how come you're losing connections so often?

Mukul Mantosh

unread,
Jan 2, 2018, 1:24:17 PM1/2/18
to Django users
I am not losing connection but i am preparing for failure if what happens broker connection is lost and how celery will handle the task...i don't want the request to stuck infinite time.

Jason

unread,
Jan 2, 2018, 2:15:10 PM1/2/18
to Django users
With a broker connection loss, the only thing that will happen is your workers won't pick up new tasks.

If you're posting to a result backend like redis and lose the connection, then an exception will be raised in the logs and the task will shut down.

Remember tasks are independent processes and you can tell each worker how many tasks to execute before its process is killed.

Mukul Mantosh

unread,
Jan 2, 2018, 9:14:14 PM1/2/18
to Django users
I am not using result backend my question is that when the broker connection is lost it throws a connection refused exception which i could normally catch through the following given below code.

try:
  add.delay(2, 2)
except add.OperationalError as exc:
  print('error');



This try except block works only one time and next time when i try to call again add.delay(2,2)....the code is waiting to execute because celery is re-trying to establish the connection with the broker.

I just simply don't want to do, for example: There is a website where a user signup as a new user and we have to send an verification email through celery and suddenly the connection gets lost then the code will be in waiting state because celery is again retrying to establish a connection with the lost broker.

How can we solve this problem ?

Matemática A3K

unread,
Jan 3, 2018, 3:38:24 AM1/3/18
to django...@googlegroups.com
On Tue, Jan 2, 2018 at 11:14 PM, Mukul Mantosh <mukulma...@gmail.com> wrote:
I am not using result backend my question is that when the broker connection is lost it throws a connection refused exception which i could normally catch through the following given below code.

try:
  add.delay(2, 2)
except add.OperationalError as exc:
  print('error');



This try except block works only one time and next time when i try to call again add.delay(2,2)....the code is waiting to execute because celery is re-trying to establish the connection with the broker.


I'm not sure about understanding you, where are you trying to call it again? from ipython?
 
I just simply don't want to do, for example: There is a website where a user signup as a new user and we have to send an verification email through celery and suddenly the connection gets lost then the code will be in waiting state because celery is again retrying to establish a connection with the lost broker.


I agree that this may be better for a celery mailing list, as "low level" celery is probably not the main expertise of most of the people reading here, as Jason said, for what I understand http://docs.celeryproject.org/en/latest/userguide/calling.html#calling-retry should be sufficient for the "normal" cases.

Have in mind that exception will be raised at the celery level, not django because it's async, http://docs.celeryproject.org/en/latest/userguide/calling.html#linking-callbacks-errbacks. Django delegates the task to celery and celery takes care of executing it. If you need to perfect warranty of execution, do it sync, inside the django view send the verification mail, where you can show to the user that the confirmation mail hasn't been sent.
 
How can we solve this problem ?


For what I understood, your problem is assuming that if a celery task fails, it won't be retried :)
 


On Wednesday, January 3, 2018 at 12:45:10 AM UTC+5:30, Jason wrote:
With a broker connection loss, the only thing that will happen is your workers won't pick up new tasks.

If you're posting to a result backend like redis and lose the connection, then an exception will be raised in the logs and the task will shut down.

Remember tasks are independent processes and you can tell each worker how many tasks to execute before its process is killed.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/41026c20-3ca0-46bd-8bab-361eba75aadc%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Mukul Mantosh

unread,
Jan 3, 2018, 4:25:27 AM1/3/18
to Django users
I'm not sure about understanding you, where are you trying to call it again? from ipython?

I am calling from Django, the code is inside the view.

def test(request):
  try:
    add.delay(2, 2)
  except add.OperationalError as exc:
    print('error')
return HttpResponse('working')

Point 1 - Stop RabbitMQ Server from terminal manually. (sudo service rabbitmq-server stop).
Point 2 - Reload the view from the browser, it will immediately throw connection refused error which i could catch easily using try except block as provided in above example.
Point 3 - If you try again reloading the page it hangs up over there. and it won't send any http response because it is still waiting and waiting to get response.
Point 4 - You can see celery in background trying to reconnect to the broker every 5-10 seconds.


For what I understood, your problem is assuming that if a celery task fails, it won't be retried :)

It should be retried but i can't make a user wait for it because signup process is very quick and we cannot halt any user because our connection is lost. I just simply want if connection lost try for 10 seconds if connected its good otherwise just move on don't get stuck.

Even if i tried add.apply_async((2, 2), retry=Falsebut it is still  not working and the page is not giving back the HTTP Response. 

I think i have cleared everything what you wanted to know @Matemática A3K.


On Wednesday, January 3, 2018 at 2:08:24 PM UTC+5:30, Matemática A3K wrote:
On Tue, Jan 2, 2018 at 11:14 PM, Mukul Mantosh <mukulma...@gmail.com> wrote:
I am not using result backend my question is that when the broker connection is lost it throws a connection refused exception which i could normally catch through the following given below code.

try:
  add.delay(2, 2)
except add.OperationalError as exc:
  print('error');



This try except block works only one time and next time when i try to call again add.delay(2,2)....the code is waiting to execute because celery is re-trying to establish the connection with the broker.


I'm not sure about understanding you, where are you trying to call it again? from ipython?
 
I just simply don't want to do, for example: There is a website where a user signup as a new user and we have to send an verification email through celery and suddenly the connection gets lost then the code will be in waiting state because celery is again retrying to establish a connection with the lost broker.


I agree that this may be better for a celery mailing list, as "low level" celery is probably not the main expertise of most of the people reading here, as Jason said, for what I understand http://docs.celeryproject.org/en/latest/userguide/calling.html#calling-retry should be sufficient for the "normal" cases.

Have in mind that exception will be raised at the celery level, not django because it's async, http://docs.celeryproject.org/en/latest/userguide/calling.html#linking-callbacks-errbacks. Django delegates the task to celery and celery takes care of executing it. If you need to perfect warranty of execution, do it sync, inside the django view send the verification mail, where you can show to the user that the confirmation mail hasn't been sent.
 
How can we solve this problem ?


For what I understood, your problem is assuming that if a celery task fails, it won't be retried :)
 


On Wednesday, January 3, 2018 at 12:45:10 AM UTC+5:30, Jason wrote:
With a broker connection loss, the only thing that will happen is your workers won't pick up new tasks.

If you're posting to a result backend like redis and lose the connection, then an exception will be raised in the logs and the task will shut down.

Remember tasks are independent processes and you can tell each worker how many tasks to execute before its process is killed.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.

To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.

Mukul Mantosh

unread,
Jan 3, 2018, 4:26:55 AM1/3/18
to Django users

Jani Tiainen

unread,
Jan 3, 2018, 6:43:35 AM1/3/18
to django...@googlegroups.com
Hi.

Trying to add task without running messaging backend (rabbitmq) should be consistently throwing out errors.

Since it is not a django problem I suggest that you reach celery support channels for better responses.

To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.

To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.

Matemática A3K

unread,
Jan 3, 2018, 2:34:06 PM1/3/18
to django...@googlegroups.com
On Wed, Jan 3, 2018 at 6:25 AM, Mukul Mantosh <mukulma...@gmail.com> wrote:
I'm not sure about understanding you, where are you trying to call it again? from ipython?

I am calling from Django, the code is inside the view.

def test(request):
  try:
    add.delay(2, 2)
  except add.OperationalError as exc:
    print('error')
return HttpResponse('working')

Point 1 - Stop RabbitMQ Server from terminal manually. (sudo service rabbitmq-server stop).
Point 2 - Reload the view from the browser, it will immediately throw connection refused error which i could catch easily using try except block as provided in above example.
Point 3 - If you try again reloading the page it hangs up over there. and it won't send any http response because it is still waiting and waiting to get response.

I think this might be because celery can't enqueue the task, so it does not "return" to django
 
Point 4 - You can see celery in background trying to reconnect to the broker every 5-10 seconds.

If you restart rabbitmq at this point, does it works?
 


For what I understood, your problem is assuming that if a celery task fails, it won't be retried :)

It should be retried but i can't make a user wait for it because signup process is very quick and we cannot halt any user because our connection is lost. I just simply want if connection lost try for 10 seconds if connected its good otherwise just move on don't get stuck.

Even if i tried add.apply_async((2, 2), retry=Falsebut it is still  not working and the page is not giving back the HTTP Response. 

I think i have cleared everything what you wanted to know @Matemática A3K.


Seems that it is retried by celery, but with a missing rabbitmq broker it won't work at all. If you take away the broker, you are taking away how it works, there is no way of delivering messages. The problem is how to ensure the broker is always working.

Again, this seems more appropriate for the celery community :)

To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.

To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.

Mukul Mantosh

unread,
Jan 5, 2018, 6:34:55 AM1/5/18
to Django users

@Matemática A3K  

If you restart rabbitmq at this point, does it works? 

After restarting rabbitmq everything is working fine. This example simply tells that whenever rabbitmq is down your code is going to stuck, so we need to design high available rabbitmq server which will be available during heavy workloads.

Is there any solution to this problem because it is a common scenario when a broker might fail.

Jani Tiainen

unread,
Jan 5, 2018, 7:03:41 AM1/5/18
to django...@googlegroups.com
Hi,

This is not really Django issue at all.

You should contact celery/rabbitmq support channels to get help with building high availability messaging backend.

To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.

To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.

Jason

unread,
Jan 5, 2018, 7:01:26 PM1/5/18
to Django users
To reinforce on what Jani Tianen said, this is not a django or python issue, nor really a Celery issue.  What you should research and investigate is high availability rabbitmq clusters, if this is such a concern for you.

Matemática A3K

unread,
Jan 6, 2018, 1:43:19 AM1/6/18
to django...@googlegroups.com
On Fri, Jan 5, 2018 at 9:01 PM, Jason <jjohn...@gmail.com> wrote:
To reinforce on what Jani Tianen said, this is not a django or python issue, nor really a Celery issue.  What you should research and investigate is high availability rabbitmq clusters, if this is such a concern for you.

Indeed, Mukul, you should google first :)
 
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.

Mukul Mantosh

unread,
Jan 6, 2018, 3:39:31 AM1/6/18
to Django users
I tried googling Matemática A3K but did not find any proper response but finally i came to know that i need to manage high availability redis or rabbitmq clusters. If there is any such option given by celery to simply skip the task if any broker fails then it would have solved my problem.

Thanks for helping me out @Matemática , @Jani Tianen , @Jason.


On Saturday, January 6, 2018 at 12:13:19 PM UTC+5:30, Matemática A3K wrote:
On Fri, Jan 5, 2018 at 9:01 PM, Jason <jjohn...@gmail.com> wrote:
To reinforce on what Jani Tianen said, this is not a django or python issue, nor really a Celery issue.  What you should research and investigate is high availability rabbitmq clusters, if this is such a concern for you.

Indeed, Mukul, you should google first :)
 
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.

To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
Reply all
Reply to author
Forward
0 new messages