Django queryset High CPU Usage

1,397 views
Skip to first unread message

Web Architect

unread,
Mar 10, 2017, 12:25:51 AM3/10/17
to Django users
Hi,

I am a bit perplexed by this and not sure what the solution is. Following is the scenario:

There is a Model A with 10000 records. Just a simple queryset - A.objects.all() is resulting in CPU hitting almost 100%. Is there a way to optimize this? But why would such a query result in high CPU Usage?

Would appreciate if anyone could throw some light on the above.

Thanks.

Web Architect

unread,
Mar 10, 2017, 12:37:46 AM3/10/17
to Django users
Would like to further add - the python CPU Usage is hitting almost 100 %. When I run  a Select * query on Mysql, its quite fast and CPU is normal. I am not sure if anything more needs to be done in Django. 

Melvyn Sopacua

unread,
Mar 10, 2017, 3:41:21 AM3/10/17
to django...@googlegroups.com

On Thursday 09 March 2017 21:25:51 Web Architect wrote:

 

> I am a bit perplexed by this and not sure what the solution is.

> Following is the scenario:

>

> There is a Model A with 10000 records. Just a simple queryset -

> A.objects.all() is resulting in CPU hitting almost 100%.

 

What's the problem? You have a fast db, fast network and requesting 10k records into memory. Why would you want CPU usage to be lower so that it takes longer?

The question you need to ask yourself is why you need 10k records. Nobody's gonna read them all.

--

Melvyn Sopacua

James Schneider

unread,
Mar 10, 2017, 4:22:50 AM3/10/17
to django...@googlegroups.com


On Mar 9, 2017 9:37 PM, "Web Architect" <pina...@gmail.com> wrote:
Would like to further add - the python CPU Usage is hitting almost 100 %. When I run  a Select * query on Mysql, its quite fast and CPU is normal. I am not sure if anything more needs to be done in Django. 

Ironically, things being done in Django is the reason for your CPU utilization issue in the first place.

Calling a qs.all() is NOT the same as a SELECT * statement, even more so when speaking to the scale of query that you mention.

Your SQL query is simply listing data in a table. A very easy thing to do, hence the reason it runs quickly.

The qs.all() call is also running the same query (probably). However, in addition to pulling all of the data, it is performing a transformation of that data in to Django model objects. If you are pulling 10K items, then Django is creating 10K objects, which is easily more intensive than a raw SQL query, even for simple model objects. 

In general, there's usually no practical reason to ever pull that many objects from a DB for display on a page. Filter down to a reasonable number (<100 for almost all sane cases) or implement a paging system to limit returned results. It's also probably using a ton of RAM only to be immediately thrown away at the end of the request. Browsers will disintegrate trying to render that many HTML elements simultaneously.

Look at implementing a paging system, possibly through Django's built-in mechanism, or something like Datatables and the infinite scroll plugin.



-James

Web Architect

unread,
Mar 10, 2017, 6:06:13 AM3/10/17
to Django users
Hi James,

Thanks for your response. Melvyn also posed a similar point of not loading the whole records. 

But all the records are needed for reporting purposes - where the data is read from the DB and a csv report is created. I am not quite an expert on Django but I am not sure if there is a better way to do it. 

The scenario is as follows to make it clearer:

Ours is an ecommerce site built on Django. Our admin/accounting team needs to download reports now and then. We have a Django model for the line items purchased. Now there could be 10k line items sold and each line items are associated with other models like payments, shipments etc which is a complex set of relations. 

We do not yet have a sophisticated reporting mechanism but was working on building a simplistic reporting system on Django. But I am finding issues with scaling up - as reported with CPU Usage and the amount of time taken. If there is a way to optimise this - would be great otherwise we might not have to look for standard methods of reporting tools. 

Would appreciate suggestions/advices on the above.

Thanks,

James Bennett

unread,
Mar 10, 2017, 6:25:35 AM3/10/17
to django...@googlegroups.com
If all you need is to export data from your database (with or without transforming it along the way) to a CSV, using the normal QuerySet methods is probably the wrong approach; you don't need model objects to do that. Some options include:

* Use raw SQL to query for the data and push it to CSV (also, some databases natively understand how to export query results to CSV)
* Use the values() or values_list() methods to use lighter-weight basic Python data structures (dictionaries and lists) instead of model objects
* If you *must* instantiate model objects, use the iterator() method to avoid keeping them around in-memory, and look at server-side cursors as an option
* If you're fetching related data, make sure you're eager-loading the relations to avoid N+1 problems.


--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/566cf05e-babf-456c-91fa-a698f7c7537d%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Melvyn Sopacua

unread,
Mar 10, 2017, 6:55:39 AM3/10/17
to django...@googlegroups.com

On Friday 10 March 2017 03:06:12 Web Architect wrote:

> Hi James,

>

> Thanks for your response. Melvyn also posed a similar point of not

> loading the whole records.

>

> But all the records are needed for reporting purposes - where the data

> is read from the DB and a csv report is created. I am not quite an

> expert on Django but I am not sure if there is a better way to do it.

>

> The scenario is as follows to make it clearer:

>

> Ours is an ecommerce site built on Django. Our admin/accounting team

> needs to download reports now and then. We have a Django model for

> the line items purchased. Now there could be 10k line items sold and

> each line items are associated with other models like payments,

> shipments etc which is a complex set of relations.

 

The most scalable solution is to not send the CSV to the browser and not do it at the webserver.

Use some tasking system like Celery to generate the report at a different server. Use a management command to do it. Then mail the report or make it available as static file via rsync/ssh/whathavyou.

 

You get bonus points for setting up the report generating server with a read-only slave of the database.

 

This scales much better and doesn't tie up webserver resources.

--

Melvyn Sopacua

Web Architect

unread,
Mar 10, 2017, 6:56:56 AM3/10/17
to Django users
Hi James,

Thanks for the clarification. Much appreciated. Will follow your points for the reporting part considering the overheads in ORM. 

Thanks,
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.

To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.

Vijay Khemlani

unread,
Mar 10, 2017, 7:01:14 AM3/10/17
to django...@googlegroups.com
There is something wrong in your setup

I can query a 400.000 item table in less than a second with the
typical "Model.objects.all()". Django does not convert all of the
entries into objects just by that query.

You don't have any managers, or anything that can result in a
side-effect beyond the query itself?
>> email to django-users...@googlegroups.com.
>> To post to this group, send email to django...@googlegroups.com.
>> Visit this group at https://groups.google.com/group/django-users.
>> To view this discussion on the web visit https://groups.google.com/d/
>> msgid/django-users/566cf05e-babf-456c-91fa-a698f7c7537d%40googlegroups.com
>> <https://groups.google.com/d/msgid/django-users/566cf05e-babf-456c-91fa-a698f7c7537d%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-users...@googlegroups.com.
> To post to this group, send email to django...@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-users.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/CAL13Cg9k7DhTKEgYi%3DB59NLZck2oP6gzu_Bjckz%3D_wqYwYA%3D3Q%40mail.gmail.com.

Web Architect

unread,
Mar 10, 2017, 7:40:28 AM3/10/17
to Django users
Hi Melvyn,

Thanks for your response.

We are using the Celery for all tasks which could be done offline or separately. Report generation is one of it. We trigger the task from Browser but the report creation is run via Celery which when completed sends the csv filepath to the browser to be downloaded. 

But the CPU usage and time taken are high. Though right now we are using a single multicore HW. Since I am not quite adept in Django, hence, I raised the query if there is a way to optimise. I have used defer, only etc to lighten the query but still its taking time and CPU. But seems like ORM is a bottle-neck. 

I agree it's prudent to have the reporting on a separate HW. Would consider it.

Thanks.

Web Architect

unread,
Mar 10, 2017, 7:51:42 AM3/10/17
to Django users
Hi,

Thanks for your response. 

Well, its not just the queryset but when the query triggered by an if condition or a for loop - thats what I meant. I was doing some basic performance check as well while working on the report generation - I observed the issue. There's no manager other than the default one. The CPU usage spikes and it takes some time before the Objects are created. 

But when you say not all objects are created at one go, you mean they can be deferred? You mean the query results are stored in the memory till each record is accessed or something using the objects? Could you please clarify?

Thanks. 

Vijay Khemlani

unread,
Mar 10, 2017, 9:29:56 AM3/10/17
to django...@googlegroups.com
This is what you said

"There is a Model A with 10000 records. Just a simple queryset -
A.objects.all() is resulting in CPU hitting almost 100%. Is there a
way to optimize this? But why would such a query result in high CPU
Usage?"

By that, I assume that you opened a shell and executed

"A.objects.all()"

If you have other code that is being executed (your for loop or
something) you better post it to clarify your question.
>> >> email to django-users...@googlegroups.com <javascript:>.
>> >> To post to this group, send email to django...@googlegroups.com
>> <javascript:>.
>> >> Visit this group at https://groups.google.com/group/django-users.
>> >> To view this discussion on the web visit https://groups.google.com/d/
>> >> msgid/django-users/566cf05e-babf-456c-91fa-a698f7c7537d%
>> 40googlegroups.com
>> >> <
>> https://groups.google.com/d/msgid/django-users/566cf05e-babf-456c-91fa-a698f7c7537d%40googlegroups.com?utm_medium=email&utm_source=footer>
>>
>>
>> >> .
>> >>
>> >> For more options, visit https://groups.google.com/d/optout.
>> >>
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> Groups
>> > "Django users" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an
>> > email to django-users...@googlegroups.com <javascript:>.
>> > To post to this group, send email to django...@googlegroups.com
>> <javascript:>.
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-users...@googlegroups.com.
> To post to this group, send email to django...@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-users.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/165b08da-cc22-4dcb-90a8-6b35d02ac73f%40googlegroups.com.

Camilo Torres

unread,
Mar 11, 2017, 7:38:11 AM3/11/17
to Django users
Hi.

You can probably use another web framework for that report, or even rethink your architecture and put the report creation outside Django and the web server without Django ORM.

You may be interested in evaluating Celery and Django Celery. You can create a separate task outside Django to create such report, call the task from within Django with Celery, create the report and email to the user.

Regards.

Daniel Hepper

unread,
Mar 11, 2017, 7:59:57 AM3/11/17
to Django users
In additions to the suggestions you already received from others, have a look at django-import-export. It allows you to easily export data in various formats.

Hope that helps,
Daniel Hepper

Vijay Khemlani

unread,
Mar 11, 2017, 3:01:28 PM3/11/17
to django...@googlegroups.com
Am I the only one who thinks that generating a report over a set of
just 10.000 records could be done in 10 - 20 secs unless there are
some serious computations going on with that data?

For a report I have to query around 200.000 records, with
aggregations, and it takes less than a minute using the ORM.
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-users...@googlegroups.com.
> To post to this group, send email to django...@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-users.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/90218749-c5e1-4ca8-ab6c-3e191a79798f%40googlegroups.com.

James Schneider

unread,
Mar 11, 2017, 6:23:25 PM3/11/17
to django...@googlegroups.com


On Mar 11, 2017 12:01 PM, "Vijay Khemlani" <vkhe...@gmail.com> wrote:
Am I the only one who thinks that generating a report over a set of
just 10.000 records could be done in 10 - 20 secs unless there are
some serious computations going on with that data?

For a report I have to query around 200.000 records, with
aggregations, and it takes less than a minute using the ORM.

The OP never mentioned a time interval that I can find in this thread, only CPU utilization. I can only imagine that the query is taking long enough to notice the CPU utilization, which would be at least a few seconds.

Querying and aggregating 200K records within the DB is not comparable to pulling 10K individual records and performing processing on each one. An ORM call with aggregation will perform a large majority of the work in the DB, and the ORM simply wraps the response accordingly. 

-James

Vijay Khemlani

unread,
Mar 11, 2017, 7:30:00 PM3/11/17
to django...@googlegroups.com
"But the CPU usage and time taken are high" <- I'm assuming high
enough to be problematic for OP.

I'm seriously not following. Why are people suggesting reporting and
export software when OP hasn't even described the problem in detail.
It's not even clear whether the high cpu and time taken are due to the
basic query ("Model.objects.all()") or the further processing of the
report.

It could easily be a missing "select_related" which causes thousands
of joins inside a for loop.
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-users...@googlegroups.com.
> To post to this group, send email to django...@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-users.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/CA%2Be%2BciWZFoHQD%3D9UpSQzmpzO70_7MXuw6J01myYrAQ4ZN-uX4g%40mail.gmail.com.

Melvyn Sopacua

unread,
Mar 11, 2017, 7:55:14 PM3/11/17
to django...@googlegroups.com

On Saturday 11 March 2017 21:29:10 Vijay Khemlani wrote:

> "But the CPU usage and time taken are high" <- I'm assuming high

> enough to be problematic for OP.

>

> I'm seriously not following. Why are people suggesting reporting and

> export software when OP hasn't even described the problem in detail.

 

Several reasons. Some chime in without reading the entire thread (cause OP already stated he's using Celery and displaying a notice when report is done). Another is that, as you've said, there's not enough detail to get to the root cause, but you gotta start somewhere.

 

Another is that the question is two-fold:

1) What causes CPU to spike

2) How can I scale this better

 

From personal experience, I find that displaying a notice "job is scheduled. It's estimated to be done at h:m and you will be notified by email" instills a calmness on project owners, that would otherwise make them adjust project requirements, cause they're biting their nails looking at a progress bar.

 

> It's not even clear whether the high cpu and time taken are due to the

> basic query ("Model.objects.all()") or the further processing of the

> report.

 

Agreed. But none the less, offloading to other hardware frees up webserver resources and scales better.

In fact, I would stop investigating if there's other things to finish, get the budget for the second machine and pick this up again in final stages to see if there's something that can be done.

 

> It could easily be a missing "select_related" which causes thousands

> of joins inside a for loop.

 

Good one. Takes a few seconds to check if not move on.

 

--

Melvyn Sopacua

Web Architect

unread,
Mar 14, 2017, 1:25:25 AM3/14/17
to Django users
Hi Vjiay,

My apologies if the scenario is not clear. Following are the details:

Lets say there is a Model A (with fields, foreignkeys and ManyToMany relationships with other models). There are 10k records for A. Lets say following is the pseudo code for the report:

As = A.objects.all()

for a in As:
     retrieve other related data from associated models.
     Write Data in a csv report file

There's no CPU intensive work above - its just fetching data. I had used select_related and prefetch_related to reduce DB queries - otherwise MySQL CPU usage was going up.

The above was run in Celery as separate task. The Python CPU was hitting almost 100% and it was taking time to generate the report - more than 300s.

To debug the issue, I broke the above code and made it simple to narrow down the issue.

So, I just did the following:

As = A.objects.all()

if As:
     print "hello"

In the above, the CPU was hitting almost 100% and was taking almost a second or  more before Hello was printed. I also did select_related and prefetch_related to check further.

Hence, the conclusion that the query was creating the CPU spike. 

Hope I am clear.

Thanks,

Web Architect

unread,
Mar 14, 2017, 1:40:53 AM3/14/17
to Django users
I agree. The task was mostly to pull data from DB and generate the CSV. The  model structure was as follows:

Main Model A

A has foreignkey to model D
A has foriegnkey to model E
F has foreignkey to model D

Model B having manyTomany relationship with A through B1
Model C having manyTomany relationship with A through C1

It was mostly DB fetches which was the complex part. Something like fetch the latest of B related to each record of A. Fetch the latest of F related to the record of D which is related to each record of A. 

There were no CPU intensive calculations done. Though the overall DB fetches were complex, as mentioned in my response to Vijay, I just broke down the issue and tried the single queryset with select_related and prefetch_related and also "defer" and "only" to make the query lighter. But weren't helping. In fact the prefetch related on B and C were making the overall query heavier and hence, increasing the response time further.

Please note the above example was just the subset of the overall DB relations and models.


-James

Web Architect

unread,
Mar 14, 2017, 1:43:41 AM3/14/17
to Django users
Hi Camilo,

Thanks for your suggestion. Would certainly look for solutions outside Django if Django cannot suffice. But was trying to find something with Django since the web part was in Django and for easy of development.

We are already using Celery Extensively but then high Resource Usage is not desirable. If there is no alternative then we would have to live with it and probably add more HW or do some distributed processing. 

Thanks.

Web Architect

unread,
Mar 14, 2017, 1:44:18 AM3/14/17
to Django users
Hi Daniel,

Thanks for the suggestion. Would look into django-import-export.

Thanks.

Vijay Khemlani

unread,
Mar 14, 2017, 10:14:53 AM3/14/17
to django...@googlegroups.com
"In the above, the CPU was hitting almost 100% and was taking almost a second"

So, your query was resolved in a second, that's normal, also the spike
in cpu usage because... the cpu is working...

Your problem seems to be in your for loop

for a in As:
retrieve other related data from associated models.
Write Data in a csv report file

if A has 10.000 records then it is reasonable that this part explodes,
especially if you are querying the database in each iteration, also
you are writing multiple csv files?

Can't you find a way to prefetch all the records of the associated
models that you are going to need in a single query before the for
loop and store it in a dictionary or something?
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-users...@googlegroups.com.
> To post to this group, send email to django...@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-users.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/29c0528f-57a7-4f73-be77-3158fe6e61dd%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages