celery errors after moving AWS ami server on a new machine

3,012 views
Skip to first unread message

Pierre Mailhot

unread,
Jan 8, 2015, 8:37:11 PM1/8/15
to opene...@googlegroups.com
Hi,

On my test machine today I suddenly saw the following errors:

Jan  8 20:30:53 ip-10-0-0-183 [service_variant=lms][celery.worker.consumer][env:sandbox] ERROR [ip-10-0-0-183  1601] [consumer.py:796] - consumer: Cannot connect to amqp://cel...@127.0.0.1:5672//: [Errno 104] Connection reset by peer.

Trying again in 2.00 seconds...

Jan  8 20:30:53 ip-10-0-0-183 [service_variant=lms][celery.worker.consumer][env:sandbox] ERROR [ip-10-0-0-183  1774] [consumer.py:796] - consumer: Cannot connect to amqp://cel...@127.0.0.1:5672//: [Errno 104] Connection reset by peer.

Trying again in 2.00 seconds...

Jan  8 20:30:53 ip-10-0-0-183 [service_variant=lms][celery.worker.consumer][env:sandbox] ERROR [ip-10-0-0-183  1599] [consumer.py:796] - consumer: Cannot connect to amqp://cel...@127.0.0.1:5672//: [Errno 104] Connection reset by peer.

Of course, there are tons of them... The only thing I've done is create an image from one machine and launch a new server with a different IP address. I am almost certain this is because the original IP address changed.

Any suggestions on how to fix it? Thanks.



Pierre Mailhot

unread,
Jan 8, 2015, 9:28:05 PM1/8/15
to opene...@googlegroups.com
Just discovered a feature in AWS I wasn't aware of: I can create an image and specify my own private IP address.
Guess what? I launched a new instance with the same image but using my previous private IP address and everything is fine again.

I am still interested in knowing what to do to fix my problem without going back to the previous private IP address. Any suggestions?

Pierre Mailhot

unread,
Jan 9, 2015, 3:03:37 PM1/9/15
to opene...@googlegroups.com
After a few hours investigating the problem and trying a few proposed solutions on the web, I can confirm it is a RabbitMQ / Mnesia issue.


The database RabbitMQ uses is bound to the machine's hostname, so if you copied the database dir to another machine, it won't work. If this is the case, you have to set up a machine with the same hostname as before and transfer any outstanding messages to the new machine. If there's nothing important in rabbit, you could just clear everything by removing the RabbitMQ files in /var/lib/rabbitmq.

I tried removing the files and directories in /var/lib/rabbitmq/mnesia and restarting RabbitMQ or the server, it does not work for me.

I guess there are still some remnants of the previous hostname or IP address somewhere.

Any edX RabbitMQ expert out there?

Ed Zarecor

unread,
Jan 9, 2015, 9:54:53 PM1/9/15
to opene...@googlegroups.com
Pierre,

Rabbit can be messy in this circumstance.  What's the output of:

> sudo rabbitmqctl status

Let's try this and see if it gets rabbitmq happy again.

sudo bash
. /edx/app/edx_ansible/venvs/edx_ansible/bin/activate
cd /edx/app/edx_ansible/edx_ansible/playbooks/
ansible-playbook -c local -i 'localhost,' ./run_role.yml -e "role=rabbitmq" -e@/edx/app/edx_ansible/server-vars.yml

Best,

Ed.
Message has been deleted

Pierre Mailhot

unread,
Jan 12, 2015, 9:36:22 AM1/12/15
to opene...@googlegroups.com
Ed,

Here is the output of the "sudo rabbitmqctl status" command:

ubuntu@ip-10-0-0-183:/edx/var/log/lms$ sudo rabbitmqctl status

Status of node 'rabbit@ip-10-0-0-183' ...

[{pid,1381},

 {running_applications,

     [{rabbitmq_management,"RabbitMQ Management Console","3.2.3"},

      {rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.2.3"},

      {webmachine,"webmachine","1.10.3-rmq3.2.3-gite9359c7"},

      {mochiweb,"MochiMedia Web Server","2.7.0-rmq3.2.3-git680dba8"},

      {rabbitmq_management_agent,"RabbitMQ Management Agent","3.2.3"},

      {rabbit,"RabbitMQ","3.2.3"},

      {os_mon,"CPO  CXC 138 46","2.2.7"},

      {inets,"INETS  CXC 138 49","5.7.1"},

      {xmerl,"XML parser","1.2.10"},

      {mnesia,"MNESIA  CXC 138 12","4.5"},

      {amqp_client,"RabbitMQ AMQP Client","3.2.3"},

      {sasl,"SASL  CXC 138 11","2.1.10"},

      {stdlib,"ERTS  CXC 138 10","1.17.5"},

      {kernel,"ERTS  CXC 138 10","2.14.5"}]},

 {os,{unix,linux}},

 {erlang_version,

     "Erlang R14B04 (erts-5.8.5) [source] [64-bit] [rq:1] [async-threads:30] [kernel-poll:true]\n"},

 {memory,

     [{total,30832720},

      {connection_procs,5296},

      {queue_procs,5296},

      {plugins,346744},

      {other_proc,9247168},

      {mnesia,57952},

      {mgmt_db,47032},

      {msg_index,23904},

      {other_ets,1041840},

      {binary,3152},

      {code,17193313},

      {atom,1553681},

      {other_system,1307342}]},

 {vm_memory_high_watermark,0.4},

 {vm_memory_limit,1579615846},

 {disk_free_limit,50000000},

 {disk_free,14761238528},

 {file_descriptors,

     [{total_limit,924},{total_used,3},{sockets_limit,829},{sockets_used,1}]},

 {processes,[{limit,1048576},{used,191}]},

 {run_queue,0},

 {uptime,77}]

...done.


No succes with the ansible-playbook, but it is giving me some ideas. Thanks.


--
You received this message because you are subscribed to a topic in the Google Groups "Open edX operations" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/openedx-ops/1SsdJ39IQRc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to openedx-ops...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Salutations / Regards ,
Pierre Mailhot, M.Sc., CISSP, CEH

Pierre Mailhot

unread,
Jan 15, 2015, 8:17:39 PM1/15/15
to opene...@googlegroups.com
I was finally able to make it work.

The problem was that when rabbitmq was reinstalled with the ansible-playbook, it kept the IP address of the server in /etc/rabbitmq/rabbitmq-env.conf 

I simply had to change the value of 

RABBITMQ_NODE_IP_ADDRESS from 10.0.0.35 to 127.0.0.1

I was wondering why I couldn't connect on localhost and why my error messages in /edx/var/log/lms were of the type

Jan 15 20:09:10 ip-10-0-0-35 [service_variant=lms][celery.worker.consumer][env:sandbox] ERROR [ip-10-0-0-35  1508] [consumer.py:796] - consumer: Cannot connect to amqp://cel...@127.0.0.1:5672//: [Errno 111] Connection refused.

Thanks for your help Ed, it allowed me to learn a little bit more about ansible and rabbitmq.
To unsubscribe from this group and all its topics, send an email to openedx-ops+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Rohan Rawat

unread,
Jul 6, 2015, 8:15:24 AM7/6/15
to opene...@googlegroups.com
Hi, we are trying the same thing but the error is not resolved even running the ansible commands and changing the 'RABBITMQ_NODE_IP_ADDRESS'. I am still getting the connection refused error

Andrej Skoric

unread,
Jul 9, 2015, 6:41:23 AM7/9/15
to opene...@googlegroups.com
Hi everyone!

In my experience, the "connection reset by peer"  looks like the rabbitmq server is not configured correctly. Specifically, it is missing the celery user. 
The "connection refused" may be due to the user's permissions.
Please try this:
sudo rabbitmqctl list_users
If you're seeing the celery user, than check it's permissions:
sudo rabbitmqctl list_permissions -p /celery

If you have no "celery" user, do the following:
sudo rabbitmqctl add_user celery celery
Keep in mind that the second "celery" in the command is the default password for celery user. If you changed it in edx auth files (you should have) then edit the command appropriately
sudo rabbitmqctl set_permissions -p /celery ".*" ".*" ".*"
Keep in mind that the command above gave full permissions to the celery user. Edit permissions to make this more secure.
sudo service rabbitmq-server restart

Hope it helps!




 

Mario Torrisi

unread,
Aug 27, 2015, 11:24:33 AM8/27/15
to Open edX operations
Maybe I found the answer by myself... 

In the /edx/app/edx_ansible/edx_ansible/playbooks/edx-west/roles/edxapp/defaults/main.yml i've
EDXAPP_CELERY_USER: 'celery'
EDXAPP_CELERY_PASSWORD: '*************'

So I think that I can create the celery user as you suggest above, using 
If you have no "celery" user, do the following:
sudo rabbitmqctl add_user celery celery
Keep in mind that the second "celery" in the command is the default password for celery user. If you changed it in edx auth files (you should have) then edit the command appropriately
sudo rabbitmqctl set_permissions -p /celery ".*" ".*" ".*"
Keep in mind that the command above gave full permissions to the celery user. Edit permissions to make this more secure.
sudo service rabbitmq-server restart
Can you confirm?

Thanks a lot. 

Andrej Skoric

unread,
Aug 28, 2015, 8:42:32 AM8/28/15
to Open edX operations
I am not sure, haven't looked into the playbook. It is my belief, and seems to be everyone's experience, that ansible doesn't, or fails to, create the matching user in rabbitMQ. So the manual creation of the user in rabbitMQ is probably necessary.
Take a look at /edx/app/edxapp/lms.auth.json and /edx/app/edxapp/cms.auth.json. Look for CELERY_BROKER_USER and CELERY_BROKER_PASSWORD keys. If you need to change the username or password, you can do it here (restart edX afterward).
Of course, you'd have to change the username/pass in rabbitMQ to match.
It's probable the ansible playbook just sets up those keys in the two auth files and not much more than that. 
Please let us know how it went!

Natasha Wainwright

unread,
Sep 19, 2015, 7:53:59 AM9/19/15
to Open edX operations
I was having the same error and took Andrej's advice about celery permissions. However, from the above instructions, I modified this line:

sudo rabbitmqctl set_permissions -/celery ".*" ".*" ".*"

to be instead:

sudo rabbitmqctl set_permissions celery ".*" ".*" ".*"

and that worked! 

Thanks Andrej :D

zwcum...@gmail.com

unread,
Mar 11, 2016, 5:38:41 PM3/11/16
to Open edX operations
I'm having a similar problem with a similar setup. I am also on AWS and get the following error in /lms/edx.log

[service_variant=lms][celery.worker.consumer][env:sandbox] ERROR [box-dev  21289] [consumer.py:364] - consumer: Cannot connect to amqp://celery:**@127.0.0.1:5672//: [Errno 104] Connection reset by peer.


rabbitmq-conf:

RABBITMQ_NODE_PORT=5672

RABBITMQ_NODE_IP_ADDRESS=127.0.0.1


output from 'sudo rabbitmqctl status'

Status of node 'rabbit@box-dev' ...

[{pid,1213},

 {running_applications,

     [{rabbitmq_management,"RabbitMQ Management Console","3.2.3"},

      {rabbitmq_management_agent,"RabbitMQ Management Agent","3.2.3"},

      {rabbit,"RabbitMQ","3.2.3"},

      {mnesia,"MNESIA  CXC 138 12","4.5"},

      {os_mon,"CPO  CXC 138 46","2.2.7"},

      {rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.2.3"},

      {webmachine,"webmachine","1.10.3-rmq3.2.3-gite9359c7"},

      {mochiweb,"MochiMedia Web Server","2.7.0-rmq3.2.3-git680dba8"},

      {xmerl,"XML parser","1.2.10"},

      {inets,"INETS  CXC 138 49","5.7.1"},

      {amqp_client,"RabbitMQ AMQP Client","3.2.3"},

      {sasl,"SASL  CXC 138 11","2.1.10"},

      {stdlib,"ERTS  CXC 138 10","1.17.5"},

      {kernel,"ERTS  CXC 138 10","2.14.5"}]},

 {os,{unix,linux}},

 {erlang_version,

     "Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:2:2] [rq:2] [async-threads:30] [kernel-poll:true]\n"},

 {memory,

     [{total,32391312},

      {connection_procs,65176},

      {queue_procs,5408},

      {plugins,328424},

      {other_proc,9496320},

      {mnesia,60144},

      {mgmt_db,48480},

      {msg_index,34160},

      {other_ets,1060896},

      {binary,697608},

      {code,17202873},

      {atom,1598585},

      {other_system,1793238}]},

 {vm_memory_high_watermark,0.4},

 {vm_memory_limit,3030636953},

 {disk_free_limit,50000000},

 {disk_free,24899944448},

 {file_descriptors,

     [{total_limit,924},{total_used,5},{sockets_limit,829},{sockets_used,3}]},

 {processes,[{limit,1048576},{used,198}]},

 {run_queue,0},

 {uptime,8775}]



I'm at a bit of a loss here ...


Any help would be greatly appreciated. 



On Thursday, January 8, 2015 at 8:37:11 PM UTC-5, Pierre Mailhot wrote:

Rizky Ariestiyansyah

unread,
Mar 13, 2016, 12:34:42 AM3/13/16
to Open edX operations
I have a blog post about solving this problem, check here http://oonlab.com/edx/code/2015/10/21/solve-celery-error-saat-migrasi-open-edx/

Hope it help

and...@extensionengine.com

unread,
Feb 7, 2017, 11:09:18 AM2/7/17
to Open edX operations, nwain...@gmail.com
Yes, the editor seems to have removed a blank space from the command. Or I typed it incorrectly. Apologies.
The "-p /" is setting the path. It was incorrectly appended to username (celery).

Corrected:
sudo rabbitmqctl set_permissions -p / celery ".*" ".*" ".*"

Andrej

Wasim Sargiro

unread,
Aug 9, 2018, 2:55:48 AM8/9/18
to Open edX operations

Hi Pierre i am also facing the same issue , may i know how you tried to fix the issue

Aug  9 09:17:03 ip-172-31-22-219 [service_variant=lms][celery.worker.consumer][env:sandbox] ERROR [ip-172-31-22-219  2310] [consumer.py:364] - consumer: Cannot connect to amqp://celery:**@127.0.0.1:5672//: [Errno 104] Connection reset by peer.
Trying again in 32.00 seconds...

dominic Anyanna

unread,
Aug 9, 2018, 3:01:41 AM8/9/18
to opene...@googlegroups.com
Hello Wasim

sudo rabbitmqctl list_users
If you're seeing the celery user, than check it's permissions:
sudo rabbitmqctl list_permissions -p /celery

If you have no "celery" user, do the following:
sudo rabbitmqctl add_user celery celery
Keep in mind that the second "celery" in the command is the default
password for celery user. If you changed it in edx auth files (you
should have) then edit the command appropriately
sudo rabbitmqctl set_permissions -p / celery ".*" ".*" ".*"
Keep in mind that the command above gave full permissions to the
celery user. Edit permissions to make this more secure.
sudo service rabbitmq-server restart

> --
> You received this message because you are subscribed to the Google Groups "Open edX operations" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to openedx-ops...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/openedx-ops/66377e99-1320-405b-bcb6-62982e25aa08%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
Dominic Anyanna
+2348034550968

Wasim Sargiro

unread,
Aug 9, 2018, 3:29:05 AM8/9/18
to Open edX operations
Hello Dominic Anyanna thank you for reply,

I have Celery User but when i run the second command it is showing as below

ubuntu@ip-***-**-**-**:/$ sudo rabbitmqctl list_users
Listing users ...
celery  [administrator]
admin   [administrator]
edx     [administrator]
ubuntu@ip-***-**-**-**:/$ sudo rabbitmqctl list_permissions -p /celery
Listing permissions in vhost "/celery" ...
Error: no_such_vhost: /celery

Pierre Mailhot

unread,
Aug 9, 2018, 10:13:36 AM8/9/18
to Open edX operations
Have you looked at the previous answers in this thread? Did you try what was said in these answers?

Especially the answer from Ed Zarecor about rerunning the rabbitmq role.

sudo bash
. /edx/app/edx_ansible/venvs/edx_ansible/bin/activate
cd /edx/app/edx_ansible/edx_ansible/playbooks/
ansible-playbook -c local -i 'localhost,' ./run_role.yml -e "role=rabbitmq" -e@/edx/app/edx_ansible/server-vars.yml

And then editing the /etc/rabbitmq/rabbitmq-env.conf file to change RABBITMQ_NODE_IP_ADDRESS to 127.0.0.1?

You may also need to reboot or restart rabbitmq.

Wasim Sargiro

unread,
Aug 10, 2018, 12:13:45 AM8/10/18
to opene...@googlegroups.com
Thank you for reply,

Yes i tried but didnt worked

ubuntu@ip-172-31-22-219:~$ sudo bash
root@ip-172-31-22-219:~# . /edx/app/edx_ansible/venvs/edx_ansible/bin/activate
(edx_ansible) root@ip-172-31-22-219:~# cd /edx/app/edx_ansible/edx_ansible/playbooks/
(edx_ansible) root@ip-172-31-22-219:/edx/app/edx_ansible/edx_ansible/playbooks# ansible-playbook -c local -i 'localhost,' ./run_role.yml -e "role=rabbitmq" -e@/edx/app/edx_ansible/server-vars.yml
ERROR! the file_name '/edx/app/edx_ansible/server-vars.yml' does not exist, or is not readable




--
You received this message because you are subscribed to the Google Groups "Open edX operations" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openedx-ops+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openedx-ops/7e5e40c5-e626-4395-80fe-c08ac3e52326%40googlegroups.com.

Pierre Mailhot

unread,
Aug 10, 2018, 7:34:27 AM8/10/18
to Open edX operations
Just remove the last part... -e@/edx/app/edx_ansible/server-vars.yml if you do not have a server-vars file.

To unsubscribe from this group and stop receiving emails from it, send an email to openedx-ops...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "Open edX operations" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/openedx-ops/1SsdJ39IQRc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to openedx-ops...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openedx-ops/CAFpz4ce%3DvMb_rbyywqKXdtaps7adXcz%3DbQ9ijvz6j_jEfp9%3Duw%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

Kris Bhojwani

unread,
Jun 28, 2019, 4:31:42 AM6/28/19
to Open edX operations
I had this problem with Ironwood. 

I believe it occurred as a result of attaching an old [AWS EC2] instance snapshot volume to a new instance. I'm not 100% sure but I believe there must be some config files that 
are configured for a specific instance.

The way I solved this issue was to just re-install Edx ironwood on a new instance and migrate all themes, databases and other settings files.
It seems to be resolved at the moment [i.e. I can export and import courses without getting the "[Errno 104] Connection reset by peer." error message.


On Friday, 10 August 2018 18:34:27 UTC+7, Pierre Mailhot wrote:
Just remove the last part... -e@/edx/app/edx_ansible/server-vars.yml if you do not have a server-vars file.

To unsubscribe from this group and stop receiving emails from it, send an email to opene...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "Open edX operations" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/openedx-ops/1SsdJ39IQRc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to opene...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages