Websockets not working after a little while

519 views
Skip to first unread message

William Easton

unread,
Jul 3, 2018, 9:55:19 PM7/3/18
to AWX Project
Hi,

I'm having an issue where websockets stop working ~20-60 minutes after the application has been deployed. This impacts the task containers ability to post job stdout as well as removes the ability to view job details in the UI.

I saw that an issue was opened here: https://github.com/ansible/awx/issues/1861 and have added comments with my findings as I go.

I have enabled verbose logging on Daphne but I'm a little perplexed.

Daphne is showing that the websocket is opened:
2018-06-27 03:19:21,372 DEBUG    Upgraded connection daphne.response.XbupPxYRcS!aPmLgJGDZd to WebSocket daphne.response.XbupPxYRcS!hTzJudfDoM

Then suddenly nginx reports that the client closed the connection
10.255.0.2 - - [27/Jun/2018:03:19:24 +0000] "GET /websocket/ HTTP/1.1" 499 0 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17682"

And then daphne reports that the websocket has closed
2018-06-27 03:19:25,571 DEBUG    WebSocket closed for daphne.response.XbupPxYRcS!hTzJudfDoM

The browser itself reports:
WebSocket connection to 'wss://........./websocket/' failed: WebSocket is closed before the connection is established.

And the Task container reports (when running the job):
[2018-07-02 19:03:47,717: DEBUG/Worker-4] using channel_id: 2
2018-07-02 19:03:47,718 ERROR    awx.main.models.unified_jobs job 15 (running) failed to emit channel msg about status change
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/awx/main/models/unified_jobs.py", line 1169, in _websocket_emit_status
    emit_channel_notification('jobs-status_changed', status_data)
  File "/usr/lib/python2.7/site-packages/awx/main/consumers.py", line 70, in emit_channel_notification
    Group(group).send({"text": json.dumps(payload, cls=DjangoJSONEncoder)})
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/channels/channel.py", line 88, in send
    self.channel_layer.send_group(self.name, content)
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/asgi_amqp/core.py", line 190, in send_group
    self.send(channel, message)
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/asgi_amqp/core.py", line 95, in send
    self.recover()
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/asgi_amqp/core.py", line 77, in recover
    self.tdata.consumer.revive(self.tdata.connection.channel())
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/kombu/connection.py", line 255, in channel
    chan = self.transport.create_channel(self.connection)
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 92, in create_channel
    return connection.channel()
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/amqp/connection.py", line 282, in channel
    return self.Channel(self, channel_id)
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/amqp/channel.py", line 101, in __init__
    self._x_open()
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/amqp/channel.py", line 427, in _x_open
    self._send_method((20, 10), args)
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/amqp/abstract_channel.py", line 56, in _send_method
    self.channel_id, method_sig, args, content,
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/amqp/method_framing.py", line 221, in write_method
    write_frame(1, channel, payload)
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/amqp/transport.py", line 182, in write_frame
    frame_type, channel, size, payload, 0xce,
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 104] Connection reset by peer 

Can anyone help with what the next troubleshooting steps might be or with any wisdom on additional logging that could be enabled? 

Christer Hemgren

unread,
Jul 6, 2018, 5:03:11 PM7/6/18
to AWX Project
We can see similar behavior on awx 1.0.6.x.

We use ha-proxy in a separate continer in front for ssl.
Reload of ha-proxy, awx web and awx task continer use to solve ower issue.

William Easton

unread,
Jul 6, 2018, 6:45:32 PM7/6/18
to AWX Project
Does it fix it permanently or does it come back as an issue shortly after?

I'm having trouble figuring out what to test/try next. It works after a restart of the containers for some short period of time (normally less than an hour) and then stops working at some point but it doesnt seem to make sense when and why it stops working.

Christer Hemgren

unread,
Jul 7, 2018, 4:24:02 PM7/7/18
to AWX Project
No, we use to need to reload every week.
Reply all
Reply to author
Forward
0 new messages