I am using web sockets for communicating between the front end and the backend APIs for an application.
The backend is running Daphne on port 8000 as follows:
daphne application:channel_layer -b 127.0.0.1 -p 8000 --websocket_timeout 120 --access-log /var/log/daphne-access-logs
I have initialized it with 4 worker threads as follows:
python manage.py runworker --threads 4
Nginx has been configured to reverse proxy the web-socket connection request to daphne as follows:
location /wsEndpoint/ {
proxy_pass http://127.0.0.1:8000;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-for $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
}
This configuration works pretty well for sometime right after daphne and the workers are initialized.
But it tends to slow down over a period of time(say 5-6 hours) and eventually stops working at all.
I have to again restart the processes for it to function normally.
The error in Nginx at this time is as follows:
2019/06/21 14:57:46 [error] 19301#0: *36490 recv() failed (104: Connection reset by peer) while proxying upgraded connection, client: IP Address, server: <SERVER DOMAIN NAME>, request: "GET /wsEndpoint/endpoint/ HTTP/1.1", upstream: "http://127.0.0.1:8000/wsEndpoint/endpoint/", host: "SERVER DOMAIN NAME"
This problem never occurs on our staging or dev servers and is only happening on our production server.
My initial hunch was that the resources on the server might be getting used up due to which daphne workers might be getting blocked from spawning.
But it turns out that there is enough of RAM and CPU available for use.
I checked the daphne access logs and they don't have much info except the CONNECT AND DISCONNECT prints.
I am on Python 2.7 and my environment consists of:
asgi-redis==1.4.3
asgiref==1.1.2
channels==1.1.8
chardet==3.0.4
configparser==3.5.0
constantly==15.1.0
coreapi==2.3.1
coreschema==0.0.4
daphne==1.4.2
Django==1.9.5
gunicorn==19.7.1
hiredis==1.0.0
honcho==1.0.1
incremental==17.5.0
itypes==1.1.0
multiprocessing==2.6.2.1
redis==2.10.6
requests==2.21.0
requests-cache==0.4.13
six==1.12.0
Twisted==19.2.0
txaio==18.8.1
uritemplate==3.0.0
urllib3==1.24.1
vine==1.1.3
zope.interface==4.6.0
The channel capacity has been set to 200 as follows:
CHANNEL_LAYERS = {
"default": {
"BACKEND": "asgi_redis.RedisChannelLayer",
"CONFIG": {
"hosts": [("localhost", 6341)],
"capacity": 200
},
"ROUTING": "app.routing.channel_routing",
},
}
But there are no more than 100 messages on the channel at any given point of time.
Also in case the number of messages on the channel crosses the capacity shouldn't it just wait for the queue to get some free space instead of just stopping to work?
Can someone please also guide me to how can we configure error logs for daphne as the command seems to be only having the --access-log flag and there is hardly any information in that log?
Can anyone please help?
Thanks,
Vaibhav