ConnectionError: Too many heartbeats missed

1,365 views
Skip to first unread message

Pierre Mailhot

unread,
Dec 4, 2014, 10:38:28 AM12/4/14
to opene...@googlegroups.com
We currently planning to launch our Open edX implementation in a few days or a few weeks.

I have these occasional "error" messages in /edx/var/log/lms/edx.log:

Dec  3 21:52:40 ip-10-0-0-233 [service_variant=lms][celery.worker.consumer][env:sandbox] INFO [ip-10-0-0-233  10811] [consumer.py:742] - consumer: Connected to amqp://cel...@127.0.0.1:5672//.

Dec  3 21:58:00 ip-10-0-0-233 [service_variant=lms][celery.worker.consumer][env:sandbox] ERROR [ip-10-0-0-233  10811] [consumer.py:397] - consumer: Connection to broker lost. Trying to re-establish the connection...

Traceback (most recent call last):

  File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/celery/worker/consumer.py", line 395, in start

    self.consume_messages()

  File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/celery/worker/consumer.py", line 458, in consume_messages

    poll_timeout = (fire_timers(propagate=errors) if scheduled

  File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/celery/worker/hub.py", line 157, in fire_timers

    entry()

  File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/celery/utils/timer2.py", line 59, in __call__

    return self.fun(*self.args, **self.kwargs)

  File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/celery/utils/timer2.py", line 165, in _reschedules

    return fun(*args, **kwargs)

  File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/kombu/connection.py", line 270, in heartbeat_check

    return self.transport.heartbeat_check(self.connection, rate=rate)

  File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 134, in heartbeat_check

    return connection.heartbeat_tick(rate=rate)

  File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/amqp/connection.py", line 844, in heartbeat_tick

    raise ConnectionError('Too many heartbeats missed')

ConnectionError: Too many heartbeats missed

Dec  3 21:58:00 ip-10-0-0-233 [service_variant=lms][celery.worker.consumer][env:sandbox] INFO [ip-10-0-0-233  10811] [consumer.py:742] - consumer: Connected to amqp://cel...@127.0.0.1:5672//.

Unfortunately, I have no traffic at this time and it may be totally normal. Is there a way to adjust the heartbeats level? And if so, where is the configuration file? I am definitely not a celery or rabbitmq expert yet :(

Thanks for any information.

Have a nice day.

Nate Aune

unread,
Dec 4, 2014, 5:45:45 PM12/4/14
to opene...@googlegroups.com
I saw the same errors in my logs, and after a bit of Googling, it appears that if you add BROKER_HEARTBEAT=0, then the problem will go away.

So in the case of Open edX, you would either add that to aws.py or if you're deploying using Ansible and a server-vars.yml file, you'd add this to your server-vars.yml:

EDXAPP_BROKER_HEARTBEAT: 0

and then re-run the deploy with "/edx/bin/update edx-platform release"  (or whatever branch you're on)

Pierre Mailhot

unread,
Dec 4, 2014, 7:18:44 PM12/4/14
to opene...@googlegroups.com
Thanks a lot. I had found the same article, but I didn't know where to change the value of  BROKER_HEARTBEAT.
Didn't know there was a EDXAPP_BROKER_HEARTBEAT variable either... Thanks Nate, now I know where to look.

Nilesh Londhe

unread,
May 12, 2015, 10:43:07 AM5/12/15
to opene...@googlegroups.com
Hi Pierre:

Did you happen to find where to set  BROKER_HEARTBEAT = 0

Nilesh Londhe

unread,
May 12, 2015, 10:58:16 AM5/12/15
to opene...@googlegroups.com
I see Too many heartbeats missed only in AWS VPC deployments not in other deployments. I am wondering if setting BROKER_HEARTBEAT = 0 the right fix. 

Pierre Mailhot

unread,
May 12, 2015, 11:29:47 AM5/12/15
to opene...@googlegroups.com
Check 

/edx/app/edxapp/edx-platform/lms/envs/aws.py 

and

/edx/app/edxapp/edx-platform/cms/envs/aws.py

Nilesh Londhe

unread,
May 12, 2015, 11:49:47 AM5/12/15
to opene...@googlegroups.com
Thanks. It setting to zero the right fix?

Arun V

unread,
Jan 3, 2018, 7:26:45 AM1/3/18
to Open edX operations
Hi all,

I am using ficus.1 and I am having an issue, like the celeryev queue is piled up, causing high load on the server, and instance leaving from the AWS ELB. 

{"sw_sys": "Linux", "clock": 147021454, "timestamp": 1514870496.924326, "hostname": "cel...@edx.lms.core.high.edxapp2", "pid": 2826, "sw_ver": "3.1.18", "utcoffset": -6, "loadavg": [0.08, 0.06, 0.01], "processed": 0, "active": 0, "freq": 2.0, "type": "worker-heartbeat", "sw_ident": "py-celery"}

I see the following in /edx/app/edxapp/edx-platform/(lms|cms)/envs/aws.py 

BROKER_HEARTBEAT = 10.0
BROKER_HEARTBEAT_CHECKRATE = 2

Is it ok to add BROKER_HEARTBEAT = 0. Will it cause any issue to to the rabbitmq cluster and edx in general? I am running a production stack. 

Thank you,


IMPORTANT: This is an e-mail from HiFX IT Media Services Pvt. Ltd. Its content are confidential to the intended recipient. If you are not the intended recipient, be advised that you have received this e-mail in error and that any use, dissemination, forwarding, printing or copying of this e-mail is strictly prohibited. It may not be disclosed to or used by anyone other than its intended recipient, nor may it be copied in any way. If received in error, please email a reply to the sender, then delete it from your system.

Although this e-mail has been scanned for viruses, HiFX cannot ultimately accept any responsibility for viruses and it is your responsibility to scan attachments (if any).

Before you print this email or attachments, please consider the negative environmental impacts associated with printing.

Arun V

unread,
Jan 18, 2018, 12:20:35 AM1/18/18
to Open edX operations
Hello, 

I have updated the BROKER_HEARTBEAT = 0. However my rabbitmq cluster is still being bombarded with worker-heartbeat queues. Could someone please help. 

Thank you,
Reply all
Reply to author
Forward
0 new messages