Web2py adding additional and not wanted orderby

105 views
Skip to first unread message

Johann Spies

unread,
May 28, 2015, 6:28:33 AM5/28/15
to web...@googlegroups.com
The following code:

query = ((db.rresearch.id == db.isi_alt_names.rsc_id)&
                  (db.rresearch.nu.upper().like(request.vars.country + '%')))
        
data = SQLFORM.grid(query, searchable = False, links = links, fields = fields,
                         editable = False, deletable = False, details = False,
                         maxtextlength = 60,
                         create = False,
                         cache_count = count,
                         orderby = ~db.isi_alt_names.code)

produces the following query:

SELECT  rresearch.nf, rresearch.nu, isi_alt_names.code, 
isi_alt_names.institution, isi_alt_names.id, rresearch.id FROM rresearch, isi_alt_names WHERE (rresearch.id = isi_alt_names.rsc_id) ORDER BY isi_alt_names.code DESC, rresearch.id LIMIT 20 OFFSET 0;

--
Because experiencing your loyal love is better than life itself,
my lips will praise you.  (Psalm 63:3)

Johann Spies

unread,
May 28, 2015, 6:33:25 AM5/28/15
to web...@googlegroups.com
Apologies for the unfinished message.  The Gmail interface in the browser and I am not really friends.

The problem with the abovementioned query is that it adds a lot of load to the server as there are 130+ million records in the rresearch table and about as much in the isi_alt_names table making the query very slow.  Without the 'rresearch.id' in the orderby-section the query runs for about 200 ms and with it the sorting and merging of the sort results of the two tables takes about 7 minutes.

How can I prevent web2py from adding the additional orderby field?

Regards.
Johann


Niphlod

unread,
May 28, 2015, 6:42:07 AM5/28/15
to web...@googlegroups.com
auto_pagination = False.

That being said, mind that without ordering, pagination is inherently incorrect 

Johann Spies

unread,
May 28, 2015, 6:53:58 AM5/28/15
to web...@googlegroups.com
On 28 May 2015 at 12:42, Niphlod <nip...@gmail.com> wrote:

auto_pagination = False.

That being said, mind that without ordering, pagination is inherently incorrect 


Thanks.  Will that not respect my orderby = ~db.isi_alt_names.code?  I just do not want the extra ordering added.

Maybe one can patch the code to only add ordering if there is no other specified by the user.

Regards

Johann
 

黄祥

unread,
May 28, 2015, 6:56:27 AM5/28/15
to web...@googlegroups.com
perhaps you can turn of the sortable in grid and just use orderby
e.g.
sortable=False

best regards,
stifan

Johann Spies

unread,
May 28, 2015, 7:17:29 AM5/28/15
to web...@googlegroups.com
On 28 May 2015 at 12:42, Niphlod <nip...@gmail.com> wrote:

auto_pagination = False.

When did this become part of Web2py?  I do not find it in the Changelog.  It works in the latest version but no in 2.9.5.

Regards
Johann

Anthony

unread,
May 28, 2015, 9:18:25 AM5/28/15
to web...@googlegroups.com, johann...@gmail.com

For pagination to work, there must be a unique ordering. If the user specifies an orderby, we have no way of knowing whether it will result in a unique ordering (well, unless the field has unique=True set). The auto_pagination argument was introduced to allow the user to control the behavior. See the discussion here.

Anthony

Anthony

unread,
May 28, 2015, 9:24:44 AM5/28/15
to web...@googlegroups.com, johann...@gmail.com

Was originally discussed over a year ago, but only recently implemented: https://github.com/web2py/web2py/commit/4f1606fcf2d64457b2645ea1af6821076d067a22

Anthony

Anthony

unread,
May 28, 2015, 9:28:55 AM5/28/15
to web...@googlegroups.com, johann...@gmail.com
On Thursday, May 28, 2015 at 6:33:25 AM UTC-4, Johann Spies wrote:

The problem with the abovementioned query is that it adds a lot of load to the server as there are 130+ million records in the rresearch table and about as much in the isi_alt_names table making the query very slow.  Without the 'rresearch.id' in the orderby-section the query runs for about 200 ms and with it the sorting and merging of the sort results of the two tables takes about 7 minutes.

Does the db.isi_alt_names.code field contain unique values? If not, I don't think your pagination is guaranteed to work (i.e., you could get back different orderings within matching values of "code" on each select).

Anthony

Johann Spies

unread,
Jun 1, 2015, 3:16:20 AM6/1/15
to web...@googlegroups.com
On 28 May 2015 at 15:28, Anthony <abas...@gmail.com> wrote:
 
On Thursday, May 28, 2015 at 6:33:25 AM UTC-4, Johann Spies wrote:
Does the db.isi_alt_names.code field contain unique values? If not, I don't think your pagination is guaranteed to work (i.e., you could get back different orderings within matching values of "code" on each select).


No. That field does not have a unique constraint.  The pagination is not that important in this case.  The speed of the query and the first few pages would be the most important factor.  And that is handled well enough by the setup without web2py adding another ordering element.

Thanks for your replies.  I will go and read the discussion you referred to.

Regards
Johann

Johann Spies

unread,
Jun 1, 2015, 3:33:11 AM6/1/15
to web...@googlegroups.com
After reading the discussion Anthony referred to, I would like to state that I prefer the behaviour of the DAL/Grid query to result in the same outcome as if I would have done with a raw query in PostgreSQL. I test my queries in PG before I implement them in web2py because I work with fairly large datasets. PostgreSQL's  pgsql does not use the id-field for pagination if I specify an 'ORDER BY' and that is the way I prefer Web2py's grid to work and as I understand it that is what 'autopagination = False' does.

Thanks for all the hard work and thinking that goes into the coding.

Regards
Johann

Niphlod

unread,
Jun 1, 2015, 4:15:29 AM6/1/15
to web...@googlegroups.com
just to clarify: the fact that a query in pgadmin returns a somewhat ordered resultset "that you're good with", that you "test queries in raw postgresql before you implement" and that web2py shows the same result with auto_pagination=False is not in ANY scenario an assurance of the fact that your results will be paginated correctly.

auto_pagination=False is somewhat to be considered like "please do it wrongly, hope for the best".

If you don't use a correct clause to order by with, your mileage may vary but an error will definitely pop up (wrong records in next page, or duplicate records in several pages). "prepare for the worst".

That's database 101 (and Set Theory, the math on which relational databases are founded on).... repeat after me: a Query defines a Set, a Set is by design unordered, a Set is by design unordered, a Set is by design unordered ....... the result is the ordered Set.

If you like your users to paginate wrongly instead of fixing the database, it's your choice to make, but its strongly advised against by "sane defaults" (read Set Theory, every best practice out there, etc etc etc)

Johann Spies

unread,
Jun 1, 2015, 4:20:44 AM6/1/15
to web...@googlegroups.com
Thank you for the "sermon"  :)


Regards
Johann

Anthony

unread,
Jun 1, 2015, 7:38:28 AM6/1/15
to web...@googlegroups.com, johann...@gmail.com
Just to be clear, the old behavior was a bug (and a particular problem when the user sorted on a column within the grid, which overrides the original orderby). Pagination is a fundamental feature of the grid, so it ought to "just work." Note, we cannot leave it up to the developer to explicitly specify the proper set of orderby fields because of the issue with user-initiated sorting in the UI (which overrides the default ordering specified by the developer).

You have a special case in which you are willing to risk pagination errors in order to boost speed. While that might be a reasonable trade-off in your particular case, it is not necessarily appropriate default behavior. However, we should certainly document the new behavior and the auto_pagination option.

If you'd like, you can do something like the following in a model:

import functools
mygrid
= functools.partial(SQLFORM.grid, auto_pagination=False)

and then use mygrid in place of SQLFORM.grid in your code (though I suspect in most cases you would still prefer the default behavior in order to guarantee proper pagination when speed is not an issue).

Anthony

Johann Spies

unread,
Jun 1, 2015, 8:07:29 AM6/1/15
to web...@googlegroups.com
On 1 June 2015 at 13:38, Anthony <abas...@gmail.com> wrote:
Just to be clear, the old behavior was a bug (and a particular problem when the user sorted on a column within the grid, which overrides the original orderby). Pagination is a fundamental feature of the grid, so it ought to "just work." Note, we cannot leave it up to the developer to explicitly specify the proper set of orderby fields because of the issue with user-initiated sorting in the UI (which overrides the default ordering specified by the developer).

You have a special case in which you are willing to risk pagination errors in order to boost speed. While that might be a reasonable trade-off in your particular case, it is not necessarily appropriate default behavior. However, we should certainly document the new behavior and the auto_pagination option.

If you'd like, you can do something like the following in a model:

import functools
mygrid
= functools.partial(SQLFORM.grid, auto_pagination=False)

and then use mygrid in place of SQLFORM.grid in your code (though I suspect in most cases you would still prefer the default behavior in order to guarantee proper pagination when speed is not an issue).

Thanks Anthony.  Correct.  I do not want it in all cases. After reading up on it I came accross this link (http://chrisdone.com/posts/postgresql-pagination) where the author illustrates just how slow PostgreSQL's "OFFSET" can be which is probably part of my problem in this case.  I will just have to develop alternative methods to work with cases where one queries millions of rows.  One of my tables has more than 500 million records.

Regards
Johann


Anthony

unread,
Jun 1, 2015, 2:22:51 PM6/1/15
to web...@googlegroups.com, johann...@gmail.com
Thanks Anthony.  Correct.  I do not want it in all cases. After reading up on it I came accross this link (http://chrisdone.com/posts/postgresql-pagination) where the author illustrates just how slow PostgreSQL's "OFFSET" can be which is probably part of my problem in this case.  I will just have to develop alternative methods to work with cases where one queries millions of rows.  One of my tables has more than 500 million records.

The difference between your two cases wasn't whether OFFSET was used (both involved pagination) but rather whether the "id" field was used for sorting. Does the difference between the two queries go away if you remove the OFFSET?

Anthony

Derek

unread,
Jun 1, 2015, 8:16:59 PM6/1/15
to web...@googlegroups.com
Yeah, I had to explain that to a few of my coworkers once. If it's faster to give you your data unordered, and you didn't specify an order, you're going to get it in any kind of order. Like if another query before yours returns the exact same data but it's reverse ordered... well, if you do that same where clause without an order by clause, chances are you'll get your data ordered same way as the one before it because your data is in the cache already. Or I imagine it's a huge table, can't possibly fit into cache, disk is being read page by agonizing page, and your query comes in, right in the middle of a page. Well, you're going to get your data from that page and then it will have to loop back and get the rest... 

yea, lots of scenarios but the end point is the same, if you don't specify an order, you can't expect any order, or any repeatable order.

Johann Spies

unread,
Jun 2, 2015, 4:26:35 AM6/2/15
to web...@googlegroups.com
On 1 June 2015 at 20:22, Anthony <abas...@gmail.com> wrote:
Thanks Anthony.  Correct.  I do not want it in all cases. After reading up on it I came accross this link (http://chrisdone.com/posts/postgresql-pagination) where the author illustrates just how slow PostgreSQL's "OFFSET" can be which is probably part of my problem in this case.  I will just have to develop alternative methods to work with cases where one queries millions of rows.  One of my tables has more than 500 million records.

The difference between your two cases wasn't whether OFFSET was used (both involved pagination) but rather whether the "id" field was used for sorting. Does the difference between the two queries go away if you remove the OFFSET?


Apologies for my unclear "in this case".  I actually meant the situation of working with such large datasets.  No in the case of the original question I was not moving away of OFFSET 0.

Thanks for your attention.

Regards
Johann 
Reply all
Reply to author
Forward
0 new messages