[bug] netbox-rqworker uses excessive CPU when idle on AWS

453 views
Skip to first unread message

John Wang

unread,
Nov 4, 2019, 11:55:10 PM11/4/19
to NetBox
Hi,

I think I've found a bug, or at the very least a usability issue. I'm running netbox v2.6.7 9f7313e. Not sure if it's AWS-specific or more widespread. Let me know if you need any more info, or if you want me to file this on the github issue tracker.

---

netbox makes an AWS t3.micro instance extremely unresponsive, and increasing the instance size seems to increase idle CPU usage. By unresponsive, I mean ssh takes long time to connect, and even typing in a shell is laggy.

Initially I thought I was not meeting the system requirements, so I started to increase the instance size. I confirmed that there was no excessive swapping (`kswapd0` activity) happening. Although I couldn't find any minimum system requirements, I also expected that a webserver should be able to idle without rendering a system unresponsive. Even with no clients accessing the website, I observed that the instance CPU credits were constantly depleted. This is steady state with a t3.small:

Screenshot from 2019-11-04 23-20-08.png


Also the server was averaging 10% CPU usage on a t3.micro, which counterintuitively doubles to 20% on a t3.small which has twice the RAM:

Screenshot from 2019-11-04 23-30-38.png



`htop` showed that this CPU usage was from `python manage.py rqworker` running every ~10 seconds, using 75% CPU. The PID changed each time, indicating that a new process was being spawned.

From what I understand, rqworker is used to [enable webhooks](https://github.com/netbox-community/netbox/issues/3113#issuecomment-487060964). However, it doesn't seem to be critical to the operation of the rest of the website. I'm now running with `netbox-rqworker` disabled in the supervisor configs. After stopping `netbox-rqworker` the idle CPU usage drops to <1%:

Screenshot from 2019-11-04 23-28-24.png



Hope this is helpful,
John

Brian Candler

unread,
Nov 5, 2019, 2:07:48 AM11/5/19
to NetBox
> `htop` showed that this CPU usage was from `python manage.py rqworker` running every ~10 seconds, using 75% CPU. The PID changed each time, indicating that a new process was being spawned.

That sounds like you are constantly respawning rqworker.  I am guessing you have it running under systemd or supervisord, it starts and crashes, and it gets respawned.

To find what's going wrong, look at your supervisor logs (e.g. for systemd: "journalctl -xe")

Or: stop it, and run rqworker from the command line under the exact same conditions as the supervisor would - especially as the same user id.  Something like this:

sudo su - www-data -s /bin/bash
cd /opt/netbox/netbox
python3 manage.py rqworker

And see what error is reported. 

Brian Candler

unread,
Nov 5, 2019, 2:42:24 AM11/5/19
to NetBox
As regards "ssh takes long time to connect, and even typing in a shell is laggy", it seems unlikely that Netbox is at fault, if CPU load is only 10-20%.

There's a simple way to prove it one way or the other though: simply stop the Netbox processes (gunicorn and rqworker) and see if it improves.

John Wang

unread,
Nov 5, 2019, 2:50:16 AM11/5/19
to Brian Candler, NetBox
Thanks for your quick reply. This is all I get while running as `www-data`. Something obvious must be wrong, but I'm not sure what.

$ python3 manage.py rqworker
Unknown command: 'rqworker'
Type 'manage.py help' for usage.

By the way, stopping netbox-rqworker from supervisorctl does fix the unresponsiveness. This behavior is because average CPU usage exceeds the AWS instance's CPU credits.

Best,

John Wang
Robotics Engineer
May Mobility


--
You received this message because you are subscribed to a topic in the Google Groups "NetBox" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/netbox-discuss/SrhbmNG6fvw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to netbox-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/netbox-discuss/671be74c-a8ed-44ac-affc-56c259dd16ca%40googlegroups.com.

Brian Candler

unread,
Nov 5, 2019, 3:04:36 AM11/5/19
to NetBox
What version of Netbox did you install?

What does "pip3 list | grep django" (as root) show?  Do you have django-rq?

You're right that something serious is wrong.  Maybe you missed the step for installing requirements:

requirements.txt says it would have installed django-rq==2.1.0

Brian Candler

unread,
Nov 5, 2019, 3:09:10 AM11/5/19
to NetBox
Sorry, you did say what you're running - v2.6.7.

John Wang

unread,
Nov 5, 2019, 3:11:38 AM11/5/19
to Brian Candler, NetBox
Yes, django-rq is installed. I did install the dependency packages as directed.

$ pip3 list | grep django
django-cacheops (4.1)
django-cors-headers (3.0.2)
django-debug-toolbar (2.0)
django-filter (2.1.0)
django-js-asset (1.2.2)
django-mptt (0.9.1)
django-prometheus (1.0.15)
django-rq (2.1.0)
django-tables2 (2.0.6)
django-taggit (1.1.0)
django-taggit-serializer (0.1.7)
django-timezone-field (3.0)
djangorestframework (3.9.4)

John Wang
Robotics Engineer
May Mobility

--
You received this message because you are subscribed to a topic in the Google Groups "NetBox" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/netbox-discuss/SrhbmNG6fvw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to netbox-discus...@googlegroups.com.

Brian Candler

unread,
Nov 5, 2019, 3:17:28 AM11/5/19
to NetBox
Now try it as the www-data user:

sudo su - www-data -s /bin/bash
pip3 list | grep django
echo "import django_rq" | python3
exit

If the import is successful, no output is generated. If it gives an error, then almost certainly the packages have been installed in the wrong place.  If you login to the box as a regular user, in some environments "sudo pip3 install..." can install the packages for the local user instead of globally.

In that case, this might fix it:

cd /opt/netbox
sudo -H pip3 install -r requirements.txt

Or it could be a permissions problem with the libraries you've installed.  Either way, this is a problem with pip, not Netbox.

John Wang

unread,
Nov 5, 2019, 3:22:56 AM11/5/19
to Brian Candler, NetBox
It does sound like a problem with the setup.

There's no problem accessing the django_rq package as www-data

www-data@netbox:~$ pip3 list | grep django
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.

django-cacheops (4.1)
django-cors-headers (3.0.2)
django-debug-toolbar (2.0)
django-filter (2.1.0)
django-js-asset (1.2.2)
django-mptt (0.9.1)
django-prometheus (1.0.15)
django-rq (2.1.0)
django-tables2 (2.0.6)
django-taggit (1.1.0)
django-taggit-serializer (0.1.7)
django-timezone-field (3.0)
djangorestframework (3.9.4)
www-data@netbox:~$ python3
Python 3.6.8 (default, Oct  7 2019, 12:59:55)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import django_rq
>>> 

John Wang
Robotics Engineer
May Mobility

--
You received this message because you are subscribed to a topic in the Google Groups "NetBox" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/netbox-discuss/SrhbmNG6fvw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to netbox-discus...@googlegroups.com.

Brian Candler

unread,
Nov 5, 2019, 4:39:47 AM11/5/19
to NetBox
I am not exactly sure how manage.py picks up external commands, the info should be here somewhere:

Aha... I notice in /opt/netbox/netbox/netbox/settings.py:

# Only load django-rq if the webhook backend is enabled
if WEBHOOKS_ENABLED:
    INSTALLED_APPS.append('django_rq')

Maybe you have WEBHOOKS_ENABLED set to False in netbox/configuration.py?  In which case, you shouldn 't be running rq_worker at all.  Or you can set it to True and run rq_worker.

If that fixes the problem, please open a ticket on github to clarify the documentation at https://netbox.readthedocs.io/en/latest/installation/3-http-daemon/#supervisord-installation

John Wang

unread,
Nov 5, 2019, 2:16:52 PM11/5/19
to Brian Candler, NetBox
That seems to be it. WEBHOOKS_ENABLED is set to False in my config, and in the default config. Changing it to True also resolves the CPU usage issue. I have filed a ticket:

John Wang
Robotics Engineer
May Mobility

--
You received this message because you are subscribed to a topic in the Google Groups "NetBox" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/netbox-discuss/SrhbmNG6fvw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to netbox-discus...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages