On Thursday 09 March 2017 21:25:51 Web Architect wrote:
> I am a bit perplexed by this and not sure what the solution is.
> Following is the scenario:
>
> There is a Model A with 10000 records. Just a simple queryset -
> A.objects.all() is resulting in CPU hitting almost 100%.
What's the problem? You have a fast db, fast network and requesting 10k records into memory. Why would you want CPU usage to be lower so that it takes longer?
The question you need to ask yourself is why you need 10k records. Nobody's gonna read them all.
--
Melvyn Sopacua
Would like to further add - the python CPU Usage is hitting almost 100 %. When I run a Select * query on Mysql, its quite fast and CPU is normal. I am not sure if anything more needs to be done in Django.
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/566cf05e-babf-456c-91fa-a698f7c7537d%40googlegroups.com.
On Friday 10 March 2017 03:06:12 Web Architect wrote:
> Hi James,
>
> Thanks for your response. Melvyn also posed a similar point of not
> loading the whole records.
>
> But all the records are needed for reporting purposes - where the data
> is read from the DB and a csv report is created. I am not quite an
> expert on Django but I am not sure if there is a better way to do it.
>
> The scenario is as follows to make it clearer:
>
> Ours is an ecommerce site built on Django. Our admin/accounting team
> needs to download reports now and then. We have a Django model for
> the line items purchased. Now there could be 10k line items sold and
> each line items are associated with other models like payments,
> shipments etc which is a complex set of relations.
The most scalable solution is to not send the CSV to the browser and not do it at the webserver.
Use some tasking system like Celery to generate the report at a different server. Use a management command to do it. Then mail the report or make it available as static file via rsync/ssh/whathavyou.
You get bonus points for setting up the report generating server with a read-only slave of the database.
This scales much better and doesn't tie up webserver resources.
--
Melvyn Sopacua
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
You can probably use another web framework for that report, or even rethink your architecture and put the report creation outside Django and the web server without Django ORM.
You may be interested in evaluating Celery and Django Celery. You can create a separate task outside Django to create such report, call the task from within Django with Celery, create the report and email to the user.
Regards.
Am I the only one who thinks that generating a report over a set of
just 10.000 records could be done in 10 - 20 secs unless there are
some serious computations going on with that data?
For a report I have to query around 200.000 records, with
aggregations, and it takes less than a minute using the ORM.
On Saturday 11 March 2017 21:29:10 Vijay Khemlani wrote:
> "But the CPU usage and time taken are high" <- I'm assuming high
> enough to be problematic for OP.
>
> I'm seriously not following. Why are people suggesting reporting and
> export software when OP hasn't even described the problem in detail.
Several reasons. Some chime in without reading the entire thread (cause OP already stated he's using Celery and displaying a notice when report is done). Another is that, as you've said, there's not enough detail to get to the root cause, but you gotta start somewhere.
Another is that the question is two-fold:
1) What causes CPU to spike
2) How can I scale this better
From personal experience, I find that displaying a notice "job is scheduled. It's estimated to be done at h:m and you will be notified by email" instills a calmness on project owners, that would otherwise make them adjust project requirements, cause they're biting their nails looking at a progress bar.
> It's not even clear whether the high cpu and time taken are due to the
> basic query ("Model.objects.all()") or the further processing of the
> report.
Agreed. But none the less, offloading to other hardware frees up webserver resources and scales better.
In fact, I would stop investigating if there's other things to finish, get the budget for the second machine and pick this up again in final stages to see if there's something that can be done.
> It could easily be a missing "select_related" which causes thousands
> of joins inside a for loop.
Good one. Takes a few seconds to check if not move on.
--
Melvyn Sopacua
-James