Worker stops responding to messages after several days of inactivity

671 views
Skip to first unread message

Moolicious

unread,
Mar 2, 2011, 3:37:39 AM3/2/11
to celery-users
Hi,

I am having an issue with an Internet-connected worker where it simply
stops responding to task requests or remote control commands after
about a week of inactivity. Tasks are properly queued but are never
picked up, even though it is the only worker connected. As long as
there is consistent activity, the worker doesn't seem to get stuck in
this unresponsive state and everything works perfectly. The worker is
running Celery 2.2.3 on Debian Lenny and is connected to RabbitMQ
2.2.0 running on Ubuntu Maverick via SSL and the default amqplib
backend.

Running netstat on the worker box and the RabbitMQ server both show
established connections as one would expect. What's interesting, is
that restarting RabbitMQ on the server does absolutely nothing to the
worker when it's in this state. Nothing whatsoever is emitted via the
worker's logging and as far as I can tell, it thinks it's still
happily connected to the broker. The worker isn't frozen as a Ctrl+C
initiates a warm shutdown and everything closes cleanly. RabbitMQ's
logs show nothing during this period of inactivity that would indicate
the connection dying.

As soon as I restart the worker, everything works perfectly again
until several days of inactivity cause the worker to return to this
state. My hunch is that an exception of some sort is getting swallowed
inside Kombu or perhaps amqplib and the connection dies but is never
reported and never recovered. Again, this worker is connected to the
broker over the Internet which is likely causing this issue to surface
where on a LAN connection might never show up.

Any ideas? Any help would be greatly appreciated.

Thanks all.

Ask Solem

unread,
Mar 2, 2011, 9:45:57 AM3/2/11
to celery-users
What are the worker processes doing when this is happening?

Usually the reason for such symptoms is hanging tasks, e.g. all
workers
processing a task that never returns because it's waiting for some
reasource (I/O, task result wait/taskset join deadlocks), but if it
shuts down cleanly, this may not be the case.
Still interesting to know if the processes are all active at that
point, so maybe you could
install the setproctitle module? That way you can see the task a
worker process is currently
working on in ps listings.

Moolicious

unread,
Mar 2, 2011, 4:32:43 PM3/2/11
to celery-users
Hi Ask,

The tasks are dead simple -- basically just local "Hello World" Python
code at the moment. I don't see any possible way for them to hang. The
tasks are not dependent on any resources and the parameters to the
task are always the same.

The app I'm building is in the early development phase, and I'm the
only one sending tasks to it. I'll do some testing, send a few tasks
and everything works until I stop sending tasks to it for several
days. I then come back to start testing it again and I send the
identical task with the identical parameters and the worker just
doesn't pick it up until I shut it down and restart it.

Cal Leeming [Simplicity Media Ltd]

unread,
Mar 2, 2011, 4:39:43 PM3/2/11
to celery...@googlegroups.com, rath...@gmail.com
Can I just add to this.

I also experienced this same problem where the workers would just randomly freeze for no reason.

Upon inspection, it was due to prefetch/queue length problems between RabbitMQ and Celery (one of the techs over at rabbitmq logged into our servers to take a look whilst it was happening, found the problem, but didn't know why celery was causing it to happen, and it was deemed a bug with celery, as it could not be replicated when using AMQP alone).

FYI, we never got round to fixing this lol.

--
You received this message because you are subscribed to the Google Groups "celery-users" group.
To post to this group, send email to celery...@googlegroups.com.
To unsubscribe from this group, send email to celery-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/celery-users?hl=en.



Cal Leeming [Simplicity Media Ltd]

unread,
Mar 2, 2011, 4:49:57 PM3/2/11
to Brian Bouterse, celery...@googlegroups.com, rath...@gmail.com
Yeah lemme go pull up the historic emails, give me 10.

On Wed, Mar 2, 2011 at 9:48 PM, Brian Bouterse <bmbo...@gmail.com> wrote:
Could you elaborate more with some technical details please?

Brian
--
Brian Bouterse
ITng Services

Brian Bouterse

unread,
Mar 2, 2011, 4:48:49 PM3/2/11
to celery...@googlegroups.com, Cal Leeming [Simplicity Media Ltd], rath...@gmail.com
Could you elaborate more with some technical details please?

Brian

On Wed, Mar 2, 2011 at 4:39 PM, Cal Leeming [Simplicity Media Ltd] <cal.l...@simplicitymedialtd.co.uk> wrote:

Cal Leeming [Simplicity Media Ltd]

unread,
Mar 2, 2011, 5:01:41 PM3/2/11
to Brian Bouterse, celery...@googlegroups.com, rath...@gmail.com
Here is a dump of the email from one of the dev guys of RabbitMQ (Matthew), and also a response from As...@opera.com.


Matthew goes into quite extensive detail to confirm this bug was related to celery, but nothing was resolved once this was submitted to the mailing list here (altho I didn't ever lodge an official bug report)

Hope this helps.

Cal

Ask Solem Hoel

unread,
Mar 2, 2011, 5:07:44 PM3/2/11
to celery...@googlegroups.com

On Mar 2, 2011, at 10:39 PM, Cal Leeming [Simplicity Media Ltd] wrote:

> Can I just add to this.
>
> I also experienced this same problem where the workers would just randomly freeze for no reason.
>
> Upon inspection, it was due to prefetch/queue length problems between RabbitMQ and Celery (one of the techs over at rabbitmq logged into our servers to take a look whilst it was happening, found the problem, but didn't know why celery was causing it to happen, and it was deemed a bug with celery, as it could not be replicated when using AMQP alone).
>
> FYI, we never got round to fixing this lol.
>

Actually, this is to be expected...

The prefetch_count is used to limit the number of messages received at a time; If you have 10 worker processes you don't want to receive a million tasks at once, but rather let other celeryd's
have a chance at processing them.

If the prefetch count is exceeded, it means that all the workers are busy and Celery doesn't
want to accept any more tasks.

So, if the active tasks never returns, then it won't receive any more messages either,
and this is usually the case for people experiencing symptoms like these.

--
{Ask Solem,
+47 98435213 | twitter.com/asksol }.

Cal Leeming [Simplicity Media Ltd]

unread,
Mar 2, 2011, 5:12:11 PM3/2/11
to celery...@googlegroups.com, a...@celeryproject.org
But the active task *DOES* return though. It returns absolutely fine. To prove this, I replaced the functionality with print "hello world", and the same thing happens. How could "print hello world" ever cause a deadlock or cause a task not to return? I also added print statement debugging at points in the code, and it was blocking on one of the internal amqp functions (i can't remember which one).


--

Ask Solem Hoel

unread,
Mar 2, 2011, 5:47:21 PM3/2/11
to celery...@googlegroups.com

On Mar 2, 2011, at 10:32 PM, Moolicious wrote:

> Hi Ask,
>
> The tasks are dead simple -- basically just local "Hello World" Python
> code at the moment. I don't see any possible way for them to hang. The
> tasks are not dependent on any resources and the parameters to the
> task are always the same.
>
> The app I'm building is in the early development phase, and I'm the
> only one sending tasks to it. I'll do some testing, send a few tasks
> and everything works until I stop sending tasks to it for several
> days. I then come back to start testing it again and I send the
> identical task with the identical parameters and the worker just
> doesn't pick it up until I shut it down and restart it.


Ok, thanks.

What version of Celery is this, and what version of amqplib?
Are you using the multiprocessing pool?
What celery related configuration settings are you running with?

Could you run with --loglevel=DEBUG and then send me the logs
when the problem reappears?

Also, could you send the SIGUSR1 signal to the main process?

kill -USR1 <pid of MainProcess>

This will force it to log the traceback of all active threads.


Usually workers runs for weeks and months (restarted for upgrades),
but there may still be some corner cases left (there seems to be an endless supply
of them..., a new environment, different versions, particular configurations).

Btw, there was a bug with RabbitMQ 2.2.x that could cause similar symptoms,
maybe you could also try to downgrade to RabbitMQ 2.1.1.

Moolicious

unread,
Mar 3, 2011, 1:51:07 AM3/3/11
to celery-users
Celery 2.2.3
amqplib 0.6.1
RabbitMQ 2.2.0
Python 2.5

I am using multiprocessing.

Here's the worker config:

BROKER_HOST = "host"
BROKER_PORT = 5671
BROKER_USER = "user"
BROKER_PASSWORD = "password"
BROKER_VHOST = "vhost"
BROKER_USE_SSL = True
CELERY_AMQP_TASK_RESULT_EXPIRES = 300
CELERY_RESULT_BACKEND = "amqp"

I've since upgraded Celery to 2.2.4 and RabbitMQ to 2.3.1 to see if it
fixes the issue. The worker is currently running with DEBUG log level
and I'll let you know if it shows anything when/if it stops responding
again.

Thanks for your help, I appreciate it.

Moolicious

unread,
Mar 13, 2011, 10:41:13 PM3/13/11
to celery-users
Hi,

I was able to duplicate this again and I sent a USR1 signal to the
main process; the output is below. You will notice from the log that I
fired up celeryd on March 2nd and successfully sent a task that
returned a result. I then waited until today to try again with the
identical task and it did not respond and nothing was logged.
Immediately below the last March 2nd entry is the dump from the USR1
signal I sent to it today. After sending the signal to the main
process, it reconnected to the broker and tasks are working again.

[2011-03-02 22:36:22,893: DEBUG/PoolWorker-3] Start from server,
version: 8.0, properties: {u'information': 'Licensed under the MPL.
See http://www.rabbitmq.com/', u'product': 'RabbitMQ', u'version':
'2.3.1', u'copyright': 'Copyright (C) 2007-2011 VMware, Inc.',
u'platform': 'Erlang/OTP'}, mechanisms: ['PLAIN', 'AMQPLAIN'],
locales: ['en_US']
[2011-03-02 22:36:23,207: DEBUG/PoolWorker-3] Open OK! known_hosts []
[2011-03-02 22:36:23,207: DEBUG/PoolWorker-3] using channel_id: 1
[2011-03-02 22:36:23,299: DEBUG/PoolWorker-3] Channel open
[2011-03-02 22:36:23,668: INFO/MainProcess] Task
test_task[5131ec95-4b0b-4387-8725-dd31bd6cc8c2] succeeded in
1.40606212616s: {'fields': [{'type': 'TestField',…
[2011-03-13 19:22:10,697: ERROR/MainProcess]
Heart
=================================================
File "/usr/bin/celeryd", line 8, in <module>
load_entry_point('celery==2.2.4', 'console_scripts', 'celeryd')()
File "/home/mark/rla/celery/celery/bin/celeryd.py", line 186, in
main
worker.execute_from_commandline()
File "/home/mark/rla/celery/celery/bin/base.py", line 54, in
execute_from_commandline
return self.handle_argv(prog_name, argv[1:])
File "/home/mark/rla/celery/celery/bin/base.py", line 44, in
handle_argv
return self.run(*args, **vars(options))
File "/home/mark/rla/celery/celery/bin/celeryd.py", line 95, in run
return self.app.Worker(**kwargs).run()
File "/home/mark/rla/celery/celery/apps/worker.py", line 135, in run
self.run_worker()
File "/home/mark/rla/celery/celery/apps/worker.py", line 235, in
run_worker
worker.start()
File "/home/mark/rla/celery/celery/worker/__init__.py", line 247, in
start
blocking(component.start)
File "/usr/lib/python2.5/site-packages/kombu/syn.py", line 14, in
blocking
return __sync_current(fun, *args, **kwargs)
File "/usr/lib/python2.5/site-packages/kombu/syn.py", line 30, in
__blocking__
return fun(*args, **kwargs)
File "/home/mark/rla/celery/celery/worker/consumer.py", line 264, in
start
self.consume_messages()
File "/home/mark/rla/celery/celery/worker/consumer.py", line 281, in
consume_messages
self.connection.drain_events()
File "/usr/lib/python2.5/site-packages/kombu/connection.py", line
110, in drain_events
return self.transport.drain_events(self.connection, **kwargs)
File "/usr/lib/python2.5/site-packages/kombu/transport/
pyamqplib.py", line 200, in drain_events
return connection.drain_events(**kwargs)
File "/usr/lib/python2.5/site-packages/kombu/transport/
pyamqplib.py", line 50, in drain_events
return self.wait_multi(self.channels.values(), timeout=timeout)
File "/usr/lib/python2.5/site-packages/kombu/transport/
pyamqplib.py", line 56, in wait_multi
chanmap.keys(), allowed_methods, timeout=timeout)
File "/usr/lib/python2.5/site-packages/kombu/transport/
pyamqplib.py", line 104, in _wait_multiple
channel, method_sig, args, content = self.read_timeout(timeout)
File "/usr/lib/python2.5/site-packages/kombu/transport/
pyamqplib.py", line 81, in read_timeout
return self.method_reader.read_method()
File "/usr/lib/python2.5/site-packages/amqplib/client_0_8/
method_framing.py", line 212, in read_method
self._next_method()
File "/usr/lib/python2.5/site-packages/amqplib/client_0_8/
method_framing.py", line 127, in _next_method
frame_type, channel, payload = self.source.read_frame()
File "/usr/lib/python2.5/site-packages/amqplib/client_0_8/
transport.py", line 109, in read_frame
frame_type, channel, size = unpack('>BHI', self._read(7))
File "/usr/lib/python2.5/site-packages/amqplib/client_0_8/
transport.py", line 155, in _read
result = self.sslobj.read(n)
File "/home/mark/rla/celery/celery/apps/worker.py", line 343, in
cry_handler
logger.error("\n" + cry())
File "/home/mark/rla/celery/celery/utils/__init__.py", line 404, in
cry
traceback.print_stack(frame, file=out)
=================================================
LOCAL VARIABLES
=================================================
{'frame': <frame object at 0x2a91380>,
'main_thread': <Heart(Heart, started daemon)>,
'out': <cStringIO.StringO object at 0x2a1aa40>,
't': <Heart(Heart, started daemon)>,
'thread': <Heart(Heart, started daemon)>,
'tid': 140204646479584,
'tmap': {}}


Heart
=================================================
File "/usr/lib/python2.5/threading.py", line 462, in __bootstrap
self.__bootstrap_inner()
File "/usr/lib/python2.5/threading.py", line 486, in
__bootstrap_inner
self.run()
File "/home/mark/rla/celery/celery/utils/timer2.py", line 179, in
run
delay = self.next()
File "/home/mark/rla/celery/celery/utils/timer2.py", line 168, in
next
self.not_empty.wait(1.0)
File "/usr/lib/python2.5/threading.py", line 235, in wait
_sleep(delay)
=================================================
LOCAL VARIABLES
=================================================
{'delay': 0.050000000000000003,
'endtime': 1300069330.8644979,
'gotit': False,
'remaining': 0.23581600189208984,
'saved_state': None,
'self': <Condition(<thread.lock object at 0x7f83f01f7420>, 1)>,
'timeout': 1.0,
'waiter': <thread.lock object at 0x7f83f01f74c8>}


Heart
=================================================
File "/usr/lib/python2.5/threading.py", line 462, in __bootstrap
self.__bootstrap_inner()
File "/usr/lib/python2.5/threading.py", line 486, in
__bootstrap_inner
self.run()
File "/home/mark/rla/celery/celery/concurrency/processes/pool.py",
line 169, in run
time.sleep(0.1)
=================================================
LOCAL VARIABLES
=================================================
{'self': <Supervisor(Thread-4, started daemon)>}


Heart
=================================================
File "/usr/lib/python2.5/threading.py", line 462, in __bootstrap
self.__bootstrap_inner()
File "/usr/lib/python2.5/threading.py", line 486, in
__bootstrap_inner
self.run()
File "/home/mark/rla/celery/celery/concurrency/processes/pool.py",
line 188, in run
for taskseq, set_length in iter(taskqueue.get, None):
File "/usr/lib/python2.5/Queue.py", line 165, in get
self.not_empty.wait()
File "/usr/lib/python2.5/threading.py", line 216, in wait
waiter.acquire()
=================================================
LOCAL VARIABLES
=================================================
{'saved_state': None,
'self': <Condition(<thread.lock object at 0x7f83f01f7450>, 1)>,
'timeout': None,
'waiter': <thread.lock object at 0x7f83f01f7648>}


Heart
=================================================
File "/usr/lib/python2.5/threading.py", line 462, in __bootstrap
self.__bootstrap_inner()
File "/usr/lib/python2.5/threading.py", line 486, in
__bootstrap_inner
self.run()
File "/home/mark/rla/celery/celery/concurrency/processes/pool.py",
line 368, in run
ready, task = poll(0.2)
File "/home/mark/rla/celery/celery/concurrency/processes/pool.py",
line 606, in _poll_result
return False, None
=================================================
LOCAL VARIABLES
=================================================
{'self': <celery.concurrency.processes.pool.Pool object at 0x29ae4d0>,
'timeout': 0.20000000000000001}


Heart
=================================================
File "/usr/lib/python2.5/threading.py", line 462, in __bootstrap
self.__bootstrap_inner()
File "/usr/lib/python2.5/threading.py", line 486, in
__bootstrap_inner
self.run()
File "/home/mark/rla/celery/celery/worker/controllers.py", line 64,
in run
self.move()
File "/home/mark/rla/celery/celery/worker/controllers.py", line 39,
in move
task = self.ready_queue.get(timeout=1.0)
File "/home/mark/rla/celery/celery/worker/buckets.py", line 124, in
get
self.not_empty.wait(timeout)
File "/usr/lib/python2.5/threading.py", line 235, in wait
_sleep(delay)
=================================================
LOCAL VARIABLES
=================================================
{'delay': 0.050000000000000003,
'endtime': 1300069330.8484631,
'gotit': False,
'remaining': 0.18570709228515625,
'saved_state': None,
'self': <Condition(<thread.lock object at 0x7f83f01f7300>, 1)>,
'timeout': 1.0,
'waiter': <thread.lock object at 0x7f83f01f74e0>}


Heart
=================================================
File "/usr/lib/python2.5/threading.py", line 462, in __bootstrap
self.__bootstrap_inner()
File "/usr/lib/python2.5/threading.py", line 486, in
__bootstrap_inner
self.run()
File "/home/mark/rla/celery/celery/worker/heartbeat.py", line 54, in
run
sleep(1)
=================================================
LOCAL VARIABLES
=================================================
{'bpm': 0.5,
'dispatch': <bound method EventDispatcher.send of
<celery.events.EventDispatcher object at 0x2a0cbd0>>,
'last_beat': 1300069297.56338,
'now': 1300069330.5961239,
'self': <Heart(Heart, started daemon)>}



[2011-03-13 19:22:10,699: ERROR/MainProcess] Consumer: Connection to
broker lost. Trying to re-establish connection...
[2011-03-13 19:22:10,699: DEBUG/MainProcess] Consumer: Re-establishing
connection to the broker...
[2011-03-13 19:22:10,699: DEBUG/MainProcess] Heart: Going into cardiac
arrest...
[2011-03-13 19:22:11,614: DEBUG/MainProcess] TaskConsumer: Cancelling
consumers...
[2011-03-13 19:22:11,704: DEBUG/MainProcess] EventDispatcher: Shutting
down...
[2011-03-13 19:22:11,704: DEBUG/MainProcess] BroadcastConsumer:
Cancelling consumer...
[2011-03-13 19:22:11,705: DEBUG/MainProcess] Consumer: Closing
consumer channel...
[2011-03-13 19:22:11,705: DEBUG/MainProcess] Consumer: Closing
connection to broker...
[2011-03-13 19:22:11,705: DEBUG/MainProcess] CarrotListener: Closing
broadcast channel...
[2011-03-13 19:22:12,230: DEBUG/MainProcess] Start from server,
version: 8.0, properties: {u'information': 'Licensed under the MPL.
See http://www.rabbitmq.com/', u'product': 'RabbitMQ', u'version':
'2.3.1', u'copyright': 'Copyright (C) 2007-2011 VMware, Inc.',
u'platform': 'Erlang/OTP'}, mechanisms: ['PLAIN', 'AMQPLAIN'],
locales: ['en_US']
[2011-03-13 19:22:12,538: DEBUG/MainProcess] Open OK! known_hosts []
[2011-03-13 19:22:12,538: DEBUG/MainProcess] Consumer: Connection
Established.
[2011-03-13 19:22:12,538: DEBUG/MainProcess] using channel_id: 1
[2011-03-13 19:22:12,628: DEBUG/MainProcess] Channel open
[2011-03-13 19:22:12,898: DEBUG/MainProcess] basic.qos: prefetch_count-
>32
[2011-03-13 19:22:12,988: DEBUG/MainProcess] using channel_id: 2
[2011-03-13 19:22:13,078: DEBUG/MainProcess] Channel open
[2011-03-13 19:22:13,460: DEBUG/MainProcess] Consumer: Starting
message consumer...
[2011-03-13 19:22:13,550: DEBUG/MainProcess] Consumer: Ready to accept
tasks!

Ask Solem Hoel

unread,
Mar 15, 2011, 3:35:50 PM3/15/11
to celery...@googlegroups.com

On Mar 14, 2011, at 3:41 AM, Moolicious wrote:

> Hi,
>
> I was able to duplicate this again and I sent a USR1 signal to the
> main process; the output is below. You will notice from the log that I
> fired up celeryd on March 2nd and successfully sent a task that
> returned a result. I then waited until today to try again with the
> identical task and it did not respond and nothing was logged.
> Immediately below the last March 2nd entry is the dump from the USR1
> signal I sent to it today. After sending the signal to the main
> process, it reconnected to the broker and tasks are working again.
>
>

> result = self.sslobj.read(n)

Interesting... Could you try running again with SSL disabled?

Moolicious

unread,
Apr 7, 2011, 2:22:39 AM4/7/11
to celery-users
Disabled SSL and it still hangs in exactly the same way. I noticed you
made some commits in 2.2.5 that change some of the socket error/
timeout handling. I'm going to upgrade to 2.2.5 and see if that fixes
it.

Ask Solem

unread,
Apr 8, 2011, 10:27:16 AM4/8/11
to celery...@googlegroups.com


On Thursday, April 7, 2011 at 8:22 AM, Moolicious wrote:

Disabled SSL and it still hangs in exactly the same way. I noticed you
made some commits in 2.2.5 that change some of the socket error/
timeout handling. I'm going to upgrade to 2.2.5 and see if that fixes
it.

That's not in 2.2.5, but in the release22-maint branch.
It will be part of 2.2.6 which I will release asap.

Moolicious

unread,
Apr 18, 2011, 12:41:41 PM4/18/11
to celery-users
2.2.6 seems to have broken SSL for me. When starting up celeryd, the
following exception is raised every second:

[2011-04-18 09:33:39,739: ERROR/MainProcess] Consumer: Connection to
broker lost. Trying to re-establish connection...
Traceback (most recent call last):
File "/Users/mark/code/relife_env/lib/python2.7/site-packages/celery/
worker/consumer.py", line 276, in start
self.consume_messages()
File "/Users/mark/code/relife_env/lib/python2.7/site-packages/celery/
worker/consumer.py", line 292, in consume_messages
self.connection.drain_events(timeout=1)
File "/Users/mark/code/relife_env/lib/python2.7/site-packages/kombu/
connection.py", line 139, in drain_events
return self.transport.drain_events(self.connection, **kwargs)
File "/Users/mark/code/relife_env/lib/python2.7/site-packages/kombu/
transport/pyamqplib.py", line 200, in drain_events
return connection.drain_events(**kwargs)
File "/Users/mark/code/relife_env/lib/python2.7/site-packages/kombu/
transport/pyamqplib.py", line 50, in drain_events
return self.wait_multi(self.channels.values(), timeout=timeout)
File "/Users/mark/code/relife_env/lib/python2.7/site-packages/kombu/
transport/pyamqplib.py", line 56, in wait_multi
chanmap.keys(), allowed_methods, timeout=timeout)
File "/Users/mark/code/relife_env/lib/python2.7/site-packages/kombu/
transport/pyamqplib.py", line 104, in _wait_multiple
channel, method_sig, args, content = self.read_timeout(timeout)
File "/Users/mark/code/relife_env/lib/python2.7/site-packages/kombu/
transport/pyamqplib.py", line 86, in read_timeout
return self.method_reader.read_method()
File "/Users/mark/code/relife_env/lib/python2.7/site-packages/
amqplib/client_0_8/method_framing.py", line 215, in read_method
raise m
SSLError: The read operation timed out

Looking at the code in consumer.py, this is occurring because SSL
timeouts are being caught by socket.error rather than socket.timeout.
If I hack up the code to explicitly catch SSLError and swallow it,
everything works perfectly.

The annoying thing is that the errno attribute on SSLError is None and
I don't see a clean way to catch that it is a timeout rather than
another error, other than doing a string match on the error message
itself.

Ask Solem

unread,
Apr 18, 2011, 5:41:46 PM4/18/11
to celery...@googlegroups.com

.com/asksol }.

On Monday, April 18, 2011 at 6:41 PM, Moolicious wrote:

2.2.6 seems to have broken SSL for me. When starting up celeryd, the
following exception is raised every second:

[2011-04-18 09:33:39,739: ERROR/MainProcess] Consumer: Connection to
broker lost. Trying to re-establish connection...
Traceback (most recent call last):
File "/Users/mark/code/relife_env/lib/python2.7/site-packages/celery/
worker/consumer.py", line 276, in start
self.consume_messages()
File "/Users/mark/code/relife_env/lib/python2.7/site-packages/celery/
worker/consumer.py", line 292, in consume_messages
self.connection.drain_events(timeout=1)
File "/Users/mark/code/relife_env/lib/python2.7/site-packages/kombu/
connection.py", line 139, in drain_events
return self.transport.drain_events(self.connection, **kwargs)
FFile "/Users/mark/code/relife_env/lib/python2.7/site-packages/kombu/

Ahg!  That is very annoying :(
read_timeout is a method added by Kombu, so at least we can catch it
there and re-raise as a socket.timeout so it's at least consistent.

Weirdly, SSLError is actually a subclass of socket.error, which is also true
of socket.timeout...

Thanks! it's always good to map another exception inconsistency.

Moolicious

unread,
Apr 18, 2011, 10:09:20 PM4/18/11
to celery-users
Funny. That was fixed in Python 3.2 just a few months ago:

http://bugs.python.org/issue10272

Thanks again for all the help!

On Apr 18, 2:41 pm, Ask Solem <a...@celeryproject.org> wrote:
> .com/asksol }.
>
>
>
>
>
> On Monday, April 18, 2011 at 6:41 PM, Moolicious wrote:
> > 2.2.6 seems to have broken SSL for me. When starting up celeryd, the
> > following exception is raised every second:
>
> > [2011-04-18 09:33:39,739: ERROR/MainProcess] Consumer: Connection to
> > broker lost. Trying to re-establish connection...
> > Traceback (most recent call last):
> >  File "/Users/mark/code/relife_env/lib/python2.7/site-packages/celery/
> > worker/consumer.py", line 276, in start
> >  self.consume_messages()
> >  File "/Users/mark/code/relife_env/lib/python2.7/site-packages/celery/
> > worker/consumer.py", line 292, in consume_messages
> >  self.connection.drain_events(timeout=1)
> >  File "/Users/mark/code/relife_env/lib/python2.7/site-packages/kombu/
> > connection.py", line 139, in drain_events
> >  return self.transport.drain_events(self.connection, **kwargs)
Reply all
Reply to author
Forward
0 new messages