--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/9ec3e5e6-0681-4e5b-bb8e-c450ce38c96a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/97dce1de-29a6-41d9-969a-a7df0f4f217e%40googlegroups.com.
Hi guys,As it is right now Django has the tendency to kill either your browser (if you're lucky) or the entire application server when confronted with a large database. For example, the admin always does counts for pagination and a count over a table with many rows (say, in the order of 100M) can really stall your app and database servers.
--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/aYwPykvLaMU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/5c3ffd8f-7ac6-4ac9-863d-875581c3c127%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAJAwA%3DzwbCdXFun_%3DR17EfdDEcS35GOpPi%2BsdcNPyZZFwRLs5g%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAPDLAU4C4u0x-B2toa3gAr32sARTu8zMrPvdTPAnsubZaMmqPg%40mail.gmail.com.
Model.objects.all()[0:10].filter(field=True)To me, "sane default" means django should not silently alter the query to provide a LIMIT when it is not asked for.I have also run into situations where doing a .count() or iterating a full table has broken the application, or put too much pressure on the database. Specifically with django bindings to javascript datatables. But I still wouldn't want artificial limiting on such queries.What *may* be useful, is to be able to apply queryset methods onto an already sliced queryset. That would allow users to implement queryset/manager methods that provide pre-sliced querysets to the rest of the code. The problem would be, what should happen in this case?Model.objects.all()[0:10].filter(field=True)
Should the filter be logically/internally moved to before the limit? Or should the filter be applied to the result of the limit in an outer query? Traditionally, django applies mutations in succession, but this wouldn't be very useful the for the majority of operations that would occur "after" the slicing. We *could* say all slicing is "saved" and applied at the end, but we'd definitely run into issues with users reporting that filtering isn't working as they expect - after the slice.
--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/bef66ad3-c886-4106-8336-a2db13c340cc%40googlegroups.com.
Definitely agree on this, silently altering a query's limit is probably not the way to go. Raising an exception in case of no limit and lots of results could be useful.For the sake of keeping the discussion useful:- Let's say you have a table with 50,000 items, not an insanely large amount imho.- Now someone does a non-restricted loop through "Model.objects.all()" which would return 50,000 items.
- In what case would this be desirable as opposed to limiting to 10,000 items and raising an error when the database actually returns 10,000 items.Naturally this case would only apply if no slice is given, but if you're really processing over 10,000 items in a single loop you probably know how to slice the queryset when needed.Perhaps something like this: raise QuerysetTooLarge('Your query returned over {LIMIT} results, if this is intentional please slice the queryset')Not saying this should be a default straight away but having a default of 10,000 or even 100,000 should not hurt anyone and protects against killing a server which is always a positive result in my book.
On Wednesday, November 19, 2014 1:42:59 AM UTC+1, Josh Smeaton wrote:To me, "sane default" means django should not silently alter the query to provide a LIMIT when it is not asked for.I have also run into situations where doing a .count() or iterating a full table has broken the application, or put too much pressure on the database. Specifically with django bindings to javascript datatables. But I still wouldn't want artificial limiting on such queries.What *may* be useful, is to be able to apply queryset methods onto an already sliced queryset. That would allow users to implement queryset/manager methods that provide pre-sliced querysets to the rest of the code. The problem would be, what should happen in this case?Model.objects.all()[0:10].filter(field=True)Should the filter be logically/internally moved to before the limit? Or should the filter be applied to the result of the limit in an outer query? Traditionally, django applies mutations in succession, but this wouldn't be very useful the for the majority of operations that would occur "after" the slicing. We *could* say all slicing is "saved" and applied at the end, but we'd definitely run into issues with users reporting that filtering isn't working as they expect - after the slice.
--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/96868573-1c37-44a3-9a2a-61df62994308%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAEbA68nhESxaAf44x1R%2BhffNd5YnVLMJ-HQK88Lv_bPr%3DxZoYA%40mail.gmail.com.
A sequence scan will only be made, if you query non indexed values.
So if you add a simple ORDER BY you will make a index scan, which is very fast.
The problem relies more on the database than on the ORM.
As already said. If you need to deal with that much queries you need to log your SQL statements and need to optimize them. Nothing to do with django at all.We ran into several performance issues of our applications as well even at something like 10.000 or 100.000 entries per table and it was quite obvious that the problem relied on the database/queries.
Also we learned that using the cache gives you a big performance win."The best query you could have, is the query you never need to make"
I don't think its good to make django more safely when dealing with high performance systems, since it will make the system more complex and django is already really complex inside the core orm.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAPDLAU6p1ByCHvFfpD%2BqQHE7b%3Driu6qXwcTe4SThCzBQVqRpng%40mail.gmail.com.
Nope. a large OFFSET of N will read through N rows, regardless index coverage. see http://www.postgresql.org/docs/9.1/static/queries-limit.html
The problem relies on the django paginator, that builds OFFSET queries that are unefficient by design. There are alternatives such as ordering by primary key and doing SELECT * from auth_user WHERE id > latest_id_returned . that "latest_id_returned" has to be passed back and forth to the client to keep the pagination hapening. Too bad you can't do "jump to page 15" like the django admin does.
read the relevant theread I posted before. There are huge memory allocation issues even if you just iterate the first two items of a query that potentially returns a million rows: all the rows will be fetched and stored in memory by psycopg and django ORM
...except when the cache is bigger than your entire VM.In this case if your dataset is , say, 10 TB, you will need roughly 10 TB of memory and 10 TB of data transferred between your django app and your postgres server. It is not a matter of optimization: is mere survival.
Let's first understand what's needed, than we can decide if it has a story inside the django core
Nope. a large OFFSET of N will read through N rows, regardless index coverage. see http://www.postgresql.org/docs/9.1/static/queries-limit.htmlThat's simple not true.If you define a Order By with a well indexes query, the database will only do a bitmap scan.This wiki isn't well explained. Take this:
The problem relies on the django paginator, that builds OFFSET queries that are unefficient by design. There are alternatives such as ordering by primary key and doing SELECT * from auth_user WHERE id > latest_id_returned . that "latest_id_returned" has to be passed back and forth to the client to keep the pagination hapening. Too bad you can't do "jump to page 15" like the django admin does.Paginator needs to use OFFSET, which is slower than your seek method, but SEEK has certain limitations. As you already said. Index and Order By to the Rescue, which makes OFFSET as fast as possible. WITH INDEX, ORDER BY and maybe some READ Slaves or even if you Shard your data the result is as good as the SEEK method.
read the relevant theread I posted before. There are huge memory allocation issues even if you just iterate the first two items of a query that potentially returns a million rows: all the rows will be fetched and stored in memory by psycopg and django ORMThats only happening since django will built something like "SELECT "bla", "bla", "bla" FROM table LIMIT 10 OFFSET 100;There is no ORDER BY clause, which is of course really slow, even with only 1000 entries inside a table.
...except when the cache is bigger than your entire VM.In this case if your dataset is , say, 10 TB, you will need roughly 10 TB of memory and 10 TB of data transferred between your django app and your postgres server. It is not a matter of optimization: is mere survival.Okai just as I said, If you deal with that amount of data u certainly will use READ Slaves AND / OR partition your data.
Let's first understand what's needed, than we can decide if it has a story inside the django coreThe only thing that could be improved is that we should force a ORDER BY id clause here and there. The index on the automatic id is already there.
I don't know why we should enforce something just because somebody has a huge amount of data and can't deal with it. That wouldn't follow the clean design pattern. Most people rely on the "Page to" part of the Paginator, if we patch django to use a seek method as a default it wouldn't be that good for a lot of developers.
The most people that have a performance problem could easily "fix" that by themself or throwing more hardware at it.
--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAPDLAU5HNtE%2BDfSxR7FcgdtAN6U3bSG7AVtGYhT7LLZUdY0Fjw%40mail.gmail.com.
Nope. a large OFFSET of N will read through N rows, regardless index coverage. see http://www.postgresql.org/docs/9.1/static/queries-limit.htmlThat's simple not true.If you define a Order By with a well indexes query, the database will only do a bitmap scan.This wiki isn't well explained. Take this:
read the relevant theread I posted before. There are huge memory allocation issues even if you just iterate the first two items of a query that potentially returns a million rows: all the rows will be fetched and stored in memory by psycopg and django ORMThats only happening since django will built something like "SELECT "bla", "bla", "bla" FROM table LIMIT 10 OFFSET 100;There is no ORDER BY clause, which is of course really slow, even with only 1000 entries inside a table.
I don't know why we should enforce something just because somebody has a huge amount of data and can't deal with it. That wouldn't follow the clean design pattern. Most people rely on the "Page to" part of the Paginator, if we patch django to use a seek method as a default it wouldn't be that good for a lot of developers.
The most people that have a performance problem could easily "fix" that by themself or throwing more hardware at it.
If/when an unsliced queryset were to reach a certain limit (say, 10,000, but configurable) the system would raise an error.
- Protects servers from going down due to memory constraints.
So please, can anyone give a good argument as to why any sane person would have a problem with a huge default limit which will kill the performance of your site anyhow but isn't enough to kill the entire system?
Hi Florian,
On 23 Nov 2014 16:22, "Florian Apolloner" <f.apo...@gmail.com> wrote:
>
> Hi Rick,
>
>
> On Sunday, November 23, 2014 1:11:13 PM UTC+1, Rick van Hattem wrote:
>>
>> If/when an unsliced queryset were to reach a certain limit (say, 10,000, but configurable) the system would raise an error.
>
>
> Django can't know if that would be the case without issuing an extra query -- and even then another transaction might commit a batch operation adding 10k items…
>
Actually, it can. If Django limits unsliced queries to 10001 items it can simply raise an error if 10001 items are returned.
>> - Protects servers from going down due to memory constraints.
>
>
> Not really, cause psycopg already fetched everything.
Not if Django limits it by default :)
>> So please, can anyone give a good argument as to why any sane person would have a problem with a huge default limit which will kill the performance of your site anyhow but isn't enough to kill the entire system?
>
>
> There are way easier ways to kill the performance of your site than fetching a few rows, an arbitrary limit on queries is not going to fix anything. Code applications defensively and don't use 3rd party stuff without introspecting it for problems first. While you are right that there are issues in theory, I think in practice they can be worked around easily and don't cause any real problem. Killing a VPS due to out of memory conditions can happen all the time -- be it the database or a service gone rogue or whatever -- but in the end I've rarely seen Django to be the cause of that.
In that case you are lucky. I've seen in over half a dozen different projects (which I didn't write myself so couldn't set up properly) problems like these.
Even the batch job argument has flaws imho, in the case of batch jobs it's still not a good idea to fetch everything in one big go. But if that would be the case, within Django a view is easily distinguishable from batch jobs so that doesn't have to be a problem.
But the hostility against the ideas alone make me lose hope that anything will ever be merged in. It's too bad that people always assume full control over projects. If you're hired to fix problems you can't fix design flaws that easily.
Very true, that's a fair point. That's why I'm opting for a configurable option. Patching this within Django has saved me in quite a few cases but it can have drawbacks.
> --
> -- Christophe Pettus
> x...@thebuild.com
>
> --
> You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/aYwPykvLaMU/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to django-develop...@googlegroups.com.
> To post to this group, send email to django-d...@googlegroups.com.
> Visit this group at http://groups.google.com/group/django-developers.
> To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CD10DACF-4D0A-458D-BB85-0F7BB8BFF4C0%40thebuild.com.
If that is an option than it's definitely a better location to set limits to prevent the server from going down.
It helps nothing when it comes to debugging though. Which is the primary reason for patching the orm.
And in addition to that, quite a few customers won't let you change the hosting setup just like that. Especially since uwsgi is one of the least used deployment options in my experience.
--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/aYwPykvLaMU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/C28E966C-8DB6-4E9D-BDBA-F30C7708EBBE%40thebuild.com.
--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/aYwPykvLaMU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/81A38D7E-E355-46D1-9C21-1C7D43BCC648%40thebuild.com.
In your particular case, where you have the relatively unusual situation that:
1. You have this problem, and,
2. You can't fix the code to solve this problem.
... you probably have the right answer is having a local patch for Django.
--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/aYwPykvLaMU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAGdCwBu1FY_f34vno33WD6npFoRNokmgYwO8G2dy1U%3DUsfd2hw%40mail.gmail.com.
My goal was simply to move the Django project forward but it seems the problems I've encountered in the field are too uncommon for most other developers to care or understand.