My interpretation of the following is that it's trying to communicate with something (via AMPQ?) to get the rerun processed but that something isn't communicating. All daemons seem to be running, although "certs" seems to restart on a regular basis.
This was indeed how I had my /etc/hosts configured, but changing it doesn’t seem to have helped… the certs server is restarting every minute or so as well, which I suspect should not be happening. I haven’t configured SSL at all, if that matters.
cms/edx.log:
[… same leadup as last time …]
File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/amqp/connection.py", line 204, in _wait_method
self.method_reader.read_method()
File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/amqp/method_framing.py", line 192, in read_method
raise m
supervisor/certstderr.log has:
Traceback (most recent call last):
File "/edx/app/certs/certificates/certificate_agent.py", line 197, in <module>
main()
File "/edx/app/certs/certificates/certificate_agent.py", line 64, in main
if manager.get_length() == 0:
File "/edx/app/certs/certificates/openedx_certificates/queue_xqueue.py", line 57, in get_length
response = json.loads(request.text)
File "/usr/lib/python2.7/json/__init__.py", line 326, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.7/json/decoder.py", line 384, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
It looks like the certs server is crashing and restarting…
supervisor/supervisord.log:
2015-03-04 01:01:30,355 INFO spawned: 'certs' with pid 3982
2015-03-04 01:01:32,030 INFO success: certs entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2015-03-04 01:01:48,140 INFO exited: certs (exit status 1; not expected)
xqueue/edx.log:
Mar 4 01:18:09 online [service_variant=xqueue][pika.adapters.base_connection][env:sandbox] ERROR [online 1461] [base_connection.py:260] - Socket Error on fd 13: 104
Mar 4 01:18:13 online [service_variant=xqueue][queue.management.commands.run_consumer][env:sandbox] INFO [online 1302] [run_consumer.py:118] - [182] Worker failed
Mar 4 01:18:13 online [service_variant=xqueue][queue.management.commands.run_consumer][env:sandbox] INFO [online 1302] [run_consumer.py:124] - [183] Starting worker
Mar 4 01:18:13 online [service_variant=xqueue][queue.consumer][env:sandbox] INFO [online 6130] [consumer.py:270] - [183] Starting consumer for queue edX-Open_DemoX
Mar 4 01:18:13 online [service_variant=xqueue][pika.adapters.base_connection][env:sandbox] ERROR [online 1461] [base_connection.py:260] - Socket Error on fd 13: 104
supervisor/xqueue_consumertderr.log:
Process Worker-233:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/edx/app/xqueue/xqueue/queue/consumer.py", line 275, in run
self.connection.ioloop.start()
File "/edx/app/xqueue/venvs/xqueue/src/pika/pika/adapters/select_connection.py", line 101, in start
self.poller.start()
File "/edx/app/xqueue/venvs/xqueue/src/pika/pika/adapters/select_connection.py", line 385, in start
self.poll()
File "/edx/app/xqueue/venvs/xqueue/src/pika/pika/adapters/select_connection.py", line 440, in poll
self._handler(fileno, event, write_only=write_only)
File "/edx/app/xqueue/venvs/xqueue/src/pika/pika/adapters/base_connection.py", line 283, in _handle_events
self._handle_read()
File "/edx/app/xqueue/venvs/xqueue/src/pika/pika/adapters/base_connection.py", line 299, in _handle_read
return self._handle_error(error)
File "/edx/app/xqueue/venvs/xqueue/src/pika/pika/adapters/base_connection.py", line 263, in _handle_error
self._handle_disconnect()
File "/edx/app/xqueue/venvs/xqueue/src/pika/pika/adapters/base_connection.py", line 212, in _handle_disconnect
self._adapter_disconnect()
File "/edx/app/xqueue/venvs/xqueue/src/pika/pika/adapters/base_connection.py", line 126, in _adapter_disconnect
self._check_state_on_disconnect()
File "/edx/app/xqueue/venvs/xqueue/src/pika/pika/adapters/base_connection.py", line 141, in _check_state_on_disconnect
raise exceptions.ProbableAuthenticationError
ProbableAuthenticationError
nginx/access.log seems to show an error when querying the queue:
- - 127.0.0.1 - edx [04/Mar/2015:01:21:47 +0000] "POST /xqueue/login/ HTTP/1.1" 200 72 0.063 "-" "python-requests/2.3.0 CPython/2.7.3 Linux/3.2.0-69-virtual"
- - 127.0.0.1 - edx [04/Mar/2015:01:22:04 +0000] "GET /xqueue/get_queuelen/?queue_name=certificates HTTP/1.1" 500 28 17.018 "-" "python-requests/2.3.0 CPython/2.7.3 Linux/3.2.0-69-virtual"
For completeness, the first two lines of /etc/hosts:
127.0.0.1 localhost
127.0.0.1 online online.it.mydomain
--
You received this message because you are subscribed to a topic in the Google Groups "Open edX operations" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/openedx-ops/EHYvndwmHZI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to openedx-ops...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Mar 4 01:18:13 online [service_variant=xqueue][pika.adapters.base_connection][env:sandbox] ERROR [online 1461] [base_connection.py:260] - Socket Error on fd 13: 104
Rabbitmqctl does return results, but to my untrained eye they don’t look like good ones. The rabbitmq logs are showing “edx” and “celery” have invalid credentials.
How and where are those set ? I couldn’t find anything in the ansible playbook setting any usernames or passwords.
ubuntu@online:~$ sudo rabbitmqctl cluster_status
Cluster status of node rabbit@online ...
[{nodes,[{disc,[rabbit@online]}]},
{running_nodes,[rabbit@online]},
{partitions,[]}]
...done.
ubuntu@online:~$ sudo rabbitmqctl list_queues
Listing queues ...
...done.
The latter should return at least one queue, correct?
Extracts from /var/log/rabbitmq/rab...@online.log:
_____________________________________________________________________________________
=ERROR REPORT==== 4-Mar-2015::06:41:42 ===
closing AMQP connection <0.9173.1> (127.0.0.1:33251 -> 127.0.0.1:5672):
{handshake_error,starting,0,
{amqp_error,access_refused,
"PLAIN login refused: user 'edx' - invalid credentials",
'connection.start_ok'}}
=INFO REPORT==== 4-Mar-2015::06:41:43 ===
accepting AMQP connection <0.9177.1> (127.0.0.1:33252 -> 127.0.0.1:5672)
=ERROR REPORT==== 4-Mar-2015::06:42:01 ===
closing AMQP connection <0.9200.1> (127.0.0.1:33262 -> 127.0.0.1:5672):
{handshake_error,starting,0,
{amqp_error,access_refused,
"AMQPLAIN login refused: user 'celery' - invalid credentials",
'connection.start_ok'}}
=ERROR REPORT==== 4-Mar-2015::06:42:01 ===
closing AMQP connection <0.9203.1> (127.0.0.1:33264 -> 127.0.0.1:5672):
{handshake_error,starting,0,
{amqp_error,access_refused,
"AMQPLAIN login refused: user 'celery' - invalid credentials",
'connection.start_ok'}}
=INFO REPORT==== 4-Mar-2015::06:42:01 ===
accepting AMQP connection <0.9224.1> (127.0.0.1:33270 -> 127.0.0.1:5672)
_____________________________________________________________________________________
So it looks like access is being refused due to faulty credentials – but where are those credentials set?
…Ronny
--
Rabbitmqctl does return results, but to my untrained eye they don’t look like good ones. The rabbitmq logs are showing “edx” and “celery” have invalid credentials.
How and where are those set ? I couldn’t find anything in the ansible playbook setting any usernames or passwords.
sudo rabbitmqctl add_user edx edxsudo rabbitmqctl set_permissions ".*" ".*" ".*"
sudo service rabbitmq-server restart
sudo reboot
sudo rabbitmqctl list_permissions
RABBIT_USERS: - name: 'admin' password: "{{ RABBIT_ADMIN_PASSWORD }}" - name: 'edx' password: "{{ XQUEUE_RABBITMQ_PASS }}" - name: 'celery' password: "{{ EDXAPP_CELERY_PASSWORD }}"sudo rabbitmqctl add_user admin <password>sudo rabbitmqctl set_permissions admin ".*" ".*" ".*"sudo rabbitmqctl add_user edx <password>sudo rabbitmqctl set_permissions edx ".*" ".*" ".*"sudo rabbitmqctl add_user celery <password>sudo rabbitmqctl set_permissions celery ".*" ".*" ".*"sudo service rabbitmq-server restartsudo rabbitmqctl list_permissions