Debugging DJango app on production for High CPU Usage

2,839 views
Skip to first unread message

Web Architect

unread,
Feb 23, 2016, 11:59:28 PM2/23/16
to Django users
Hi,

We have an ecommerce platform based on Django. We are using uwsgi to run the app. The issue the CPU usage is hitting the roof (sometimes going beyond 100%) for some scenarios. I would like to debug the platform on Production to see where the CPU consumption is happening. We have used Cache all over the place (including templates) as well - hence, the DB queries would be quite limited. 

I would refrain from using Django-debug toolbar as it slows down the platform further, increases the CPU usage and also need to turn the DEBUG on. Is there any other tool or way to debug the platform? Would appreciate any recommendations/suggestions. 

Also, does the Django ORM increase the CPU usage? Does it block the CPU? Would appreciate if anyone could throw some light on this.

Thanks.

Asif Saifuddin

unread,
Feb 24, 2016, 7:18:17 AM2/24/16
to Django users
What is your server configuration and system usage statistics?

Avraham Serour

unread,
Feb 24, 2016, 8:19:29 AM2/24/16
to django-users
sometimes going beyond 100%

how??

You can use django-debug-toolbar on your development machine, check the logs for the pages that take the longest to process and the one that are the most requested and start with those, of course your CPU won't be high but you should check and compare if there were improvements after changes.

Do you know already if the CPU usage spike is caused by django and not another software? do you have other stuff in the same server? database? elasticsearch? mongodb? celery workers?

were specifically are you using cache? I'm not familiar with the technical term "all over the place"

> does the Django ORM increase the CPU usage?

using the ORM increase the CPU usage if you compare to not using it, the fastest code is the one that does't run

Avraham


--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/69176a56-5604-4b3e-9887-9ebedf55dbb4%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Javier Guerra Giraldez

unread,
Feb 24, 2016, 8:36:34 AM2/24/16
to django...@googlegroups.com
On 24 February 2016 at 13:18, Avraham Serour <tov...@gmail.com> wrote:
>> sometimes going beyond 100%
>
> how??


if it's what top reports, 100% refers to a whole core. a multiforking
server (like uWSGI) can easily go well beyond that.

and that's not a bad thing

--
Javier

Will Harris

unread,
Feb 24, 2016, 8:40:22 AM2/24/16
to Django users
Hey Web Architect, I guess you never got that DB dump running in development? ;-)

Why don't you run some profiling middleware to see if you can some traces of the production system? Or how about New Relic or some such? That's pretty good at helping to identify problems spots in your stack.

Finally, you will really need to get your production setup running in a development environment if you are ever to have a hope of experimenting with different solutions. You need to understand what user actions are causing the load to spike, and reproduce that on similar infrastructure in a controlled environment where you can instrument your code to see exactly what's going on.

Nikolas Stevenson-Molnar

unread,
Feb 24, 2016, 3:17:43 PM2/24/16
to Django users
Just to be clear: is is the uwsgi process(es) consuming the CPU? I ask because you mention DB queries, which wouldn't impact the CPU of uwsgi (you'd see that reflected in the database process).

Web Architect

unread,
Feb 25, 2016, 12:58:22 AM2/25/16
to Django users
Hi Asif,

The OS is CentOS 6 Linux - HW is a dual core processor. Running Django with uwsgi. uwsgi is configured with 4 processes and 2 threads (no logic behind the numbers but trying to find the optimal combination). I just ran top and was checking the CPU usage. Mostly two instances of uwsgi is running and one is spiking beyond 100% cpu usage. 

Thanks,
Pinakee

Web Architect

unread,
Feb 25, 2016, 1:05:55 AM2/25/16
to Django users
Hi Avraham,

Please find my comments inline.


On Wednesday, February 24, 2016 at 6:49:29 PM UTC+5:30, Avraham Serour wrote:
sometimes going beyond 100%

how??

That's what I am trying to figure out :)
 
You can use django-debug-toolbar on your development machine, check the logs for the pages that take the longest to process and the one that are the most requested and start with those, of course your CPU won't be high but you should check and compare if there were improvements after changes.

I have done that on our development machine. There aren't much heavy calculation being done in any of the views. Only thing is MySQL access. But we have template based caching which is obviously reducing the DB queries for the cached period.

Do you know already if the CPU usage spike is caused by django and not another software? do you have other stuff in the same server? database? elasticsearch? mongodb? celery workers?
We are using Solr but with top, won't that show as CPU consumption by java? We do not have celery. Since it's ecommerce platform, its image intensive where we are using sorl thumbnail to generate thumbnails dynamically. But  Sorl caches the thumbnails. Hence Image processing could have been CPU intensive but thats also being cached.


were specifically are you using cache? I'm not familiar with the technical term "all over the place"

Mostly in templates. I think that should help (won't be needed in views I presume).  Also, caching the DB results.  

Web Architect

unread,
Feb 25, 2016, 1:08:38 AM2/25/16
to Django users
Hi Javier, 

I am new to uwsgi. The CPU usage is what top is reporting. Is there a way to optimise uwsgi?

Thanks.

Web Architect

unread,
Feb 25, 2016, 1:11:09 AM2/25/16
to Django users
Hi Will,

In fact thats what I am doing currently. Also, trying to run the load as per the production (similar RPS etc based on reports from ngxtop). But unfortunately not able to generate the CPU usage spike on development (similar to production).

Thanks.

Web Architect

unread,
Feb 25, 2016, 1:17:24 AM2/25/16
to Django users
Hi Nikolas,

I am new to uwsgi. Top is showing CPU consumption by uwsgi. Following is my uwsgi configuration:

master=True

socket=:7090

max-requests=5000

processes = 4

threads = 2

enable-threads = true

#harakiri = 30 (not sure if using this would be a good idea)

stats = 127.0.0.1:9191


HW is a dual core processor with CentOS 6 linux. I am not sure if there is a better way to configure uwsgi. uwsgitop is showing only one worker process being heavily used and that is the one spiking to 100% + cpu usage. 

Thanks.

Nikolas Stevenson-Molnar

unread,
Feb 25, 2016, 5:02:23 PM2/25/16
to Django users
Which cache backend are you using? Also, how's your memory usage? Do the spikes in CPU correlate with load? I.e., does the CPU use increase/decrease consistently with the number of users?

Web Architect

unread,
Feb 26, 2016, 12:29:22 AM2/26/16
to Django users
Hi Nikolas,

Cache backend is Redis. The CPU usage is directly proportional to the load (increases with the increase in load). Memory usage seems to be fine.

Thanks.

Nikolas Stevenson-Molnar

unread,
Feb 26, 2016, 11:27:42 AM2/26/16
to Django users
What sort of load are you experiencing in production? Is it possible that you're simply running into a hardware limitation and need to scale?

Web Architect

unread,
Feb 29, 2016, 3:28:16 AM2/29/16
to Django users
The load is low - around 4-5 rps. I don't think that should effect the CPU usage so much.

James Schneider

unread,
Feb 29, 2016, 4:15:41 AM2/29/16
to django...@googlegroups.com
On Tue, Feb 23, 2016 at 8:59 PM, Web Architect <pina...@gmail.com> wrote:
Hi,

We have an ecommerce platform based on Django. We are using uwsgi to run the app. The issue the CPU usage is hitting the roof (sometimes going beyond 100%) for some scenarios. I would like to debug the platform on Production to see where the CPU consumption is happening. We have used Cache all over the place (including templates) as well - hence, the DB queries would be quite limited. 

Have you validated that your cache is actually being used, and not just populated? I've seen that before.

 
I would refrain from using Django-debug toolbar as it slows down the platform further, increases the CPU usage and also need to turn the DEBUG on. Is there any other tool or way to debug the platform? Would appreciate any recommendations/suggestions. 


Have you looked into profiling the code or adding logging statements throughout your code to determine when/where particular segments are being run? I would definitely start with logging. I'm assuming you have suspicions on where your pain points might be:


I would put them in places that may be part of large loops (in terms of number of objects queried or depth of relationships traversed), or sprinkled within complex views. You have to start narrowing down which page/pages are causing your angst. 


Also, does the Django ORM increase the CPU usage? Does it block the CPU? Would appreciate if anyone could throw some light on this.

I'm not sure about blocking, but if deployed correctly, the ORM should have a negligible (and acceptable) hit to the CPU in most cases, if you notice one at all. I've seen spikes from bad M2M relationships where prefetch_related() was needed (>200 queries down to 3 with prefetch_related, and ~1-2s total response down to <80ms if I recall correctly). The most common case I run into is as part of nested {% for %} loops within a template that dig down through relationships.

I would also consider increasing the logging levels of your cache and DB to see if you are getting repetitive queries. The ORM does cause those from time to time since it has non-intuitive behavior in some edge cases. You can try that during low activity periods to keep the extra logging from overwhelming the system. Sometimes you can still catch the issue with a single end-user for something like repetitive/multiple queries, and are actually much easier to diagnose on a low usage server.

Do you have any other jobs that run against the system (session cleanup, expired inventory removal, mass mailing, etc.)? Would it be possible for those to be the culprit?

Have you figured out any reproducible trigger?

-James

Sithembewena Lloyd Dube

unread,
Feb 29, 2016, 4:50:43 AM2/29/16
to django...@googlegroups.com
New Relic.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.

For more options, visit https://groups.google.com/d/optout.



--
Regards,
Sithembewena

Gabriel - Iulian Dumbrava

unread,
Feb 29, 2016, 5:33:53 AM2/29/16
to Django users
Hi!

I have seen such a behavior on a couple of sites running an older version of Zinnia. It simply hit the 100% CPU usage on some queries.
I would also suggest integrating New Relic. It gives you a pretty detailed information on where the CPU is spending most of the time.

Web Architect

unread,
Mar 3, 2016, 1:02:15 AM3/3/16
to Django users
Hi James,

Thanks for the detailed explanation. Certainly helps and I would embed logging to debug the CPU usage. 

Please find my comments inline:


On Monday, February 29, 2016 at 2:45:41 PM UTC+5:30, James Schneider wrote:

On Tue, Feb 23, 2016 at 8:59 PM, Web Architect <pina...@gmail.com> wrote:
Hi,

We have an ecommerce platform based on Django. We are using uwsgi to run the app. The issue the CPU usage is hitting the roof (sometimes going beyond 100%) for some scenarios. I would like to debug the platform on Production to see where the CPU consumption is happening. We have used Cache all over the place (including templates) as well - hence, the DB queries would be quite limited. 

Have you validated that your cache is actually being used, and not just populated? I've seen that before.

Cache is being and has been validated. But one thing I have observed is - while I was storing ORM objects (DB results) in cache to avoid DB queries, it proved to be expensive due to the object related operations (manipulating and copying the objects in cache). We are using Redis and Redis only handles strings I think. Hence, I reverted back to DB. 

I am personally in favour of async based frameworks like Tornado - in fact have used it for a high capacity Pinterest like platform where the performance has been excellent. But Tornado is quite lightweight and lot of services need to be built by us - hence, chose Django for ecommerce. This was the first experience with a complex service (ecommerce) on a platform like Django. Since Django is Sync, I was wondering if the threads getting stuck waiting for DB responses. 

 
I would refrain from using Django-debug toolbar as it slows down the platform further, increases the CPU usage and also need to turn the DEBUG on. Is there any other tool or way to debug the platform? Would appreciate any recommendations/suggestions. 


Have you looked into profiling the code or adding logging statements throughout your code to determine when/where particular segments are being run? I would definitely start with logging. I'm assuming you have suspicions on where your pain points might be:


I would put them in places that may be part of large loops (in terms of number of objects queried or depth of relationships traversed), or sprinkled within complex views. You have to start narrowing down which page/pages are causing your angst. 


Also, does the Django ORM increase the CPU usage? Does it block the CPU? Would appreciate if anyone could throw some light on this.

I'm not sure about blocking, but if deployed correctly, the ORM should have a negligible (and acceptable) hit to the CPU in most cases, if you notice one at all. I've seen spikes from bad M2M relationships where prefetch_related() was needed (>200 queries down to 3 with prefetch_related, and ~1-2s total response down to <80ms if I recall correctly). The most common case I run into is as part of nested {% for %} loops within a template that dig down through relationships.

We have 'for loops' in templates where DB queries are being made.  Would look into those. 

I would also consider increasing the logging levels of your cache and DB to see if you are getting repetitive queries. The ORM does cause those from time to time since it has non-intuitive behavior in some edge cases. You can try that during low activity periods to keep the extra logging from overwhelming the system. Sometimes you can still catch the issue with a single end-user for something like repetitive/multiple queries, and are actually much easier to diagnose on a low usage server.

Do you have any other jobs that run against the system (session cleanup, expired inventory removal, mass mailing, etc.)? Would it be possible for those to be the culprit?
We do not have any other bulk tasks right now. If possible, we try to do those separately with crons.  

Have you figured out any reproducible trigger?

I have done some load testing with locust.io and know there are few views which are the culprits - specifically the ones where we show bunch of products. But  wanted to make sure if Django is not a bottleneck. 

 

-James

Web Architect

unread,
Mar 3, 2016, 1:02:53 AM3/3/16
to Django users
Integrated new Relic and seems to be good. Thanks for the suggestion. 
Reply all
Reply to author
Forward
0 new messages