Account Options

  1. Sign in
Google Groups Home
« Groups Home
Efficient paging using __key__ instead of a dedicated unique property
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  14 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
ryan  
View profile  
 More options Dec 12 2008, 7:27 pm
From: ryan <ryanb+appeng...@google.com>
Date: Fri, 12 Dec 2008 16:27:32 -0800 (PST)
Local: Fri, Dec 12 2008 7:27 pm
Subject: Efficient paging using __key__ instead of a dedicated unique property
Hi all! There's been a lot of good discussion here about doing paging
efficiently, as opposed to just using offset. We've been discussing
efficient paging internally too. We hope to eventually provide
something built in, but in the short term we have other priorities.
Still, we've come up with some interesting ideas, and I'd like to
share one of them here.

Most of the efficient paging techniques discussed here have been based
on sorting and filtering on a unique property. However, now that we
have __key__ filtering, http://code.google.com/appengine/docs/datastore/queriesandindexes.htm...
, we have a built-in unique field that we can filter and sort on. We
can use that to support efficient paging in any query, on any schema,
purely in user land, without requiring extra properties or other
schema changes. (It will sometimes require a new composite index, but
that's less invasive than requiring a new property.)

The executive summary is, when you first run the query, if it doesn't
already have a sort order on __key__, append one. Then, when you run
the query, save the last entity as a bookmark. To pick up from there
later, set up the original query, add filters on all of the sort order
properties so that it starts immediately after the bookmark entity.

The full design is below. We may provide some kind of built-in paging
support in the future, but it won't be this. Given that, I'd love to
see someone pick this up and implement it as an open source project!

Bookmarks
==
A bookmark is the last result entity that was returned to the app for
a given query. That bookmark entity is serialized and base64-encoded
so that apps can pass it around in URL query parameters, cookies, etc.
To keep bookmarks reasonably sized, raw properties and properties not
involved in the query should be stripped from the bookmark entity
before serializing and encoding it.

Queries
==
If you want to be able to resume a query at a bookmark, you need to
know that when you first run the query. That's because you need to
append an ascending __key__ sort order to the query if it's not
already there.

To resume a query at a bookmark, first, decode the bookmark, validate
it, and check that it has all of the properties needed by the query.
Then, use the algorithm below to generate and run a series of derived
queries that transparently resume the query at the bookmarked entity.

Yes, queries plural. If your query has a sort order, you'll need to
run two queries in serial to resume from a bookmark, due to the single
inequality filter property limit. More accurately, if your query has n
sort orders, you'll need to run n + 1 queries.

Also, as mentioned above, making a query bookmarkable sometimes
require extra composite indices that the corresponding non-
bookmarkable query wouldn't require. This is due to the extra
=__key__= sort order.

Resuming from a bookmark
==
Here's Python-like pseudocode for resuming a query at a bookmark. It
generates a series of queries derived from the original query that,
when run in order, return the original query's results starting at the
bookmark. (The indentation is meaningful, so you might want to turn on
a fixed-width font in your viewer here.)

  given a query and a bookmark entity:
    assert that all of the properties in the query are also in the
bookmark entity

    # massage the query. this may add extra sort orders, but won't
remove the
    # existing ones, so it doesn't change the query's invariants.
    if the query has an inequality filter but no sort order:
      add an ASC sort order on the inequailty filter property

    if the query doesn't have a sort order on __key__:
      append an ASC sort order on __key__

    # prepare the original query as a template for the derived queries
    if the query has inequality filters:
      remove them and record them

    for prop in the query's sort orders:
      add a new filter to the query: "prop = [bookmark entity's value
for prop]"

    remove the query's sort orders, except for __key__, and record
them

    # generate the derived queries
    make an empty FIFO queue for the derived queues
    for prop in reversed(original query's sort orders):
      make a deep copy of the original query
      on that deep copy:
        remove the = filter on prop

        replace it with a new filter:
          if original sort order on prop was ASC:
            operator = ">"
          else:
            operator = "<"
          the new filter is "prop [operator] [bookmark entity's value
for prop]"

        also, if there were other inequality filters on prop, add them
back too
        also add back the sort order on prop, at the beginning of the
sort orders
        enqueue this query

To run the query starting from a bookmark, unwind the queue, ie pop
the derived queries off one at a time, run them, and return their
results. Then each query runs out of results, pop the next, and so on.

Also, note that ancestor is unrelated. If it's specified on the
original query, it should be included in the bookmarkable starting
query and all of the derived queries, but it doesn't otherwise affect
the algorithm otherwise.

Examples
==
(Again, turn on a fixed-width font in your viewer here.)

| *Original starting query*             | *Bookmarkable starting
query*                                | *Derived queries for starting
from bookmark entity B*              |

| SELECT * FROM Foo                     | SELECT * FROM Foo ORDER BY
=__key__= ASC                     | SELECT * FROM Foo WHERE =__key__=

> B ORDER BY =__key__= ASC       |

| WHERE x = 0                           | WHERE x = 0 ORDER BY
=__key__= ASC                           | WHERE x = 0 AND =__key__= >
B ORDER BY =__key__= ASC               |

| WHERE x > 0                           | WHERE x > 0 ORDER BY x ASC,
=__key__= ASC                    | WHERE x = B.x AND =__key__= > B
ORDER BY =__key__= ASC             |
|
|                                                              | WHERE
x > B.x ORDER BY x ASC, =__key__= ASC                        |

| WHERE x = 0 AND y > 0                 | WHERE x = 0 AND y > 0 ORDER
BY y ASC, =__key__= ASC          | WHERE x = 0 AND y = B.y AND
=__key__= > B ORDER BY =__key__= ASC   |
|
|                                                              | WHERE
x = 0 AND y > B.y ORDER BY y ASC, =__key__= ASC              |

| WHERE x > 0 AND x < 9                 | WHERE x > 0 AND x < 9 ORDER
BY x ASC, =__key__= ASC          | WHERE x = B.x AND =__key__= > B
ORDER BY =__key__= ASC             |
|
|                                                              | WHERE
x > B.x AND x < 9 ORDER BY x ASC, =__key__= ASC              |

| WHERE =__key__= > A AND =__key__= < Z | WHERE =__key__= > A AND
=__key__= < Z ORDER BY =__key__= ASC | WHERE =__key__= > B AND
=__key__= < Z ORDER BY =__key__= ASC       |

| ORDER BY x ASC                        | ORDER BY x ASC, =__key__=
ASC                                | WHERE x = B.x AND =__key__= > B
ORDER BY =__key__= ASC             |
|
|                                                              | WHERE
x > B.x ORDER BY x ASC, =__key__= ASC                        |

| ORDER BY x DESC                       | ORDER BY x DESC, =__key__=
ASC                               | WHERE x = B.x AND =__key__= > B
ORDER BY =__key__= ASC             |
|
|                                                              | WHERE
x < B.x ORDER BY x DESC, =__key__= ASC                       |

| ORDER BY =__key__= ASC                | ORDER BY =__key__=
ASC                                       | WHERE =__key__= > B ORDER
BY =__key__= ASC                         |

| ORDER BY =__key__= DESC               | ORDER BY =__key__=
DESC                                      | WHERE =__key__= < B ORDER
BY =__key__= DESC                        |

| ORDER BY x ASC, y DESC                | ORDER BY x ASC, y DESC,
=__key__= ASC                        | WHERE x = B.x AND y = B.y AND
=__key__= > B ORDER BY =__key__= ASC |
|
|                                                              | WHERE
x = B.x AND y < B.y ORDER BY y DESC, =__key__= ASC           |
|
|                                                              | WHERE
x > B.x ORDER BY x ASC, y DESC, =__key__= ASC                |

| ORDER BY x ASC, =__key__= DESC        | ORDER BY x ASC, =__key__=
DESC                               | WHERE x = B.x AND =__key__= < B
ORDER BY =__key__= DESC            |
|
|                                                              | WHERE
x > B.x ORDER BY x ASC, =__key__= DESC                       |

| WHERE x = 0 ORDER BY y DESC           | WHERE x = 0 ORDER BY y DESC,
=__key__= ASC                   | WHERE x = 0 AND y = B.y AND
=__key__= > B ORDER BY =__key__= ASC   |
|
|                                                              | WHERE
x = 0 AND y < B.y ORDER BY y DESC, =__key__= ASC             |

| WHERE x > 0 AND x < 9 ORDER BY x DESC | WHERE x > 0 AND x < 9 ORDER
BY x DESC, =__key__= ASC         | WHERE x = B.x AND =__key__= > B
ORDER BY =__key__= ASC             |
|
|                                                              | WHERE
x < B AND x > 0 ORDER BY x DESC, =__key__= ASC               |


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
ryan  
View profile  
 More options Dec 12 2008, 7:32 pm
From: ryan <ryanb+appeng...@google.com>
Date: Fri, 12 Dec 2008 16:32:45 -0800 (PST)
Local: Fri, Dec 12 2008 7:32 pm
Subject: Re: Efficient paging using __key__ instead of a dedicated unique property
ugh. i don't know why i thought the long lines in the examples table
would survive. i've uploaded the post as a text file, with the long
lines intact:

http://groups.google.com/group/google-appengine/web/efficient_paging_...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Alexander Kojevnikov  
View profile  
 More options Dec 12 2008, 8:18 pm
From: Alexander Kojevnikov <alexan...@kojevnikov.com>
Date: Fri, 12 Dec 2008 17:18:48 -0800 (PST)
Local: Fri, Dec 12 2008 8:18 pm
Subject: Re: Efficient paging using __key__ instead of a dedicated unique property
> Still, we've come up with some interesting ideas, and I'd like to
> share one of them here.

Brilliant, thanks Ryan!

Is there a chance that this will be integrated into the Datastore
Viewer?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
ryan  
View profile  
 More options Dec 12 2008, 8:39 pm
From: ryan <ryanb+appeng...@google.com>
Date: Fri, 12 Dec 2008 17:39:51 -0800 (PST)
Local: Fri, Dec 12 2008 8:39 pm
Subject: Re: Efficient paging using __key__ instead of a dedicated unique property
On Dec 12, 5:18 pm, Alexander Kojevnikov <alexan...@kojevnikov.com>
wrote:

> Is there a chance that this will be integrated into the Datastore
> Viewer?

unfortunately, no, that's unlikely. if/when we do implement efficient
paging, we'll probably use something else. this is nontrivial to
implement, so i don't expect we'd do both.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Thomas Johansson  
View profile  
 More options Dec 17 2008, 1:33 am
From: Thomas Johansson <prenc...@gmail.com>
Date: Tue, 16 Dec 2008 22:33:11 -0800 (PST)
Local: Wed, Dec 17 2008 1:33 am
Subject: Re: Efficient paging using __key__ instead of a dedicated unique property
Has anybody taken on the challenge of implementing a library for this?
I'd love to see one, but it's too big a task for me to tackle
currently.

One question springs to mind, however: What about user friendly /
readable / hackable urls? How would you go about implementing e.g. ?
page=10 or ?offset=42 with the entity bookmarks, in a way that doesn't
require a bunch of processing (e.g. fragile) to maintain?

ryan: We really *really* need some kind of pagination built-in though,
it has to be said. It is more or less the number one constraint for
everybody I've talked to, and something we all need in one way or
another. The other constraints can be worked around with relative
ease, but pagination is smack in your face for even the simplest
stuff; And as you clearly demonstrate with the above, not exactly
trivial to implement efficiently, not to mention in a scalable manner.

Thanks,
Thomas


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Rodrigo Moraes  
View profile  
 More options Dec 17 2008, 5:43 am
From: "Rodrigo Moraes" <rodrigo.mor...@gmail.com>
Date: Wed, 17 Dec 2008 08:43:00 -0200
Subject: Re: [google-appengine] Re: Efficient paging using __key__ instead of a dedicated unique property

On Wed, Dec 17, 2008 at 4:33 AM, Thomas Johansson wrote:
> ryan: We really *really* need some kind of pagination built-in though,
> it has to be said. It is more or less the number one constraint for
> everybody I've talked to, and something we all need in one way or
> another. The other constraints can be worked around with relative
> ease, but pagination is smack in your face for even the simplest
> stuff; And as you clearly demonstrate with the above, not exactly
> trivial to implement efficiently, not to mention in a scalable manner.

I totally agree with this. #1 constraint for me.

-- rodrigo


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
ryan  
View profile  
 More options Dec 17 2008, 1:40 pm
From: ryan <ryanb+appeng...@google.com>
Date: Wed, 17 Dec 2008 10:40:01 -0800 (PST)
Local: Wed, Dec 17 2008 1:40 pm
Subject: Re: Efficient paging using __key__ instead of a dedicated unique property
On Dec 16, 10:33 pm, Thomas Johansson <prenc...@gmail.com> wrote:

> One question springs to mind, however: What about user friendly /
> readable / hackable urls? How would you go about implementing e.g. ?
> page=10 or ?offset=42 with the entity bookmarks, in a way that doesn't
> require a bunch of processing (e.g. fragile) to maintain?

put the bookmark in a cookie instead of a url parameter? or use POST
instead of GET? otherwise, there's probably no way around having a
really large query parameter for the bookmark in your URL. :/ i don't
do much web development, though, so definitely don't take my word as
gospel.

> ryan: We really *really* need some kind of pagination built-in though,

understood. we want it too! we have many more ideas and things we want
to do than resources to do them, but we're working as fast as we can.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Rodrigo Moraes  
View profile  
 More options Jan 28 2009, 3:57 pm
From: Rodrigo Moraes <rodrigo.mor...@gmail.com>
Date: Wed, 28 Jan 2009 12:57:03 -0800 (PST)
Local: Wed, Jan 28 2009 3:57 pm
Subject: Re: Efficient paging using __key__ instead of a dedicated unique property
a first try is here:

    http://paste.pocoo.org/show/101774/

incomplete, as i'm stuck in the FIFO part. anyway, here is the basic
idea for the API:

    # get a bookmark - from GET or cookie.
    bookmark = self.request.get('bookmark')

    # build a query normally
    query = db.Query(Foo).filter('x >', 0).filter('x <', 9).order('-
x')

    # instead of fetch directly, we call a 'bookmark' method, to tell
the query that we want to
    # bookmark the result. we pass a bookmark if one was available, or
None, or nothing.
    res = query.bookmark(bookmark).fetch(10)

    # now we can get a bookmark from the query, to pass to query
strings or cookies
    bookmark = query.bookmarked

When a bookmark is not passed to the bookmark() method, it builds the
bookmarkable query normally. Otherwise, it also (tries to) create the
derived queries.

I'm stuck in the final part of the implementation. If anyone can take
a look and mess around with it, it will be nice. Right now I feel that
my head exploded.

:-)

-- rodrigo


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Rodrigo Moraes  
View profile  
 More options Jan 29 2009, 12:39 am
From: Rodrigo Moraes <rodrigo.mor...@gmail.com>
Date: Thu, 29 Jan 2009 03:39:48 -0200
Local: Thurs, Jan 29 2009 12:39 am
Subject: Re: [google-appengine] Re: Efficient paging using __key__ instead of a dedicated unique property
Hey,
I updated the initial version. Now derived queries are generated
properly -- well, most of them work like Ryan described. Some don't.
For example:

Original:
-----------
WHERE x = B.x AND y = B.y AND __key__ > B ORDER BY __key__ ASC
WHERE x = B.x AND y < B.y ORDER BY y DESC, __key__ ASC
WHERE x > B.x ORDER BY x ASC, y DESC, __key__ ASC

This is what i get:
-----------
WHERE x = B.x AND y = B.y AND __key__ > B ORDER BY __key__ ASC
WHERE y < B.x AND x = B.y ORDER BY y DESC, __key__ ASC
WHERE x > B.x AND y = B.y ORDER BY x ASC, __key__ ASC

Maybe the original result is wrong, or maybe mine is? Could you please
make a quick review of the results in the original document?

Here is the code:

    http://bitbucket.org/moraes/appengine/src/tip/pager.py

Fetching derived queries will be the next step, after the bookmark /
derived queries are correctly set.

thank you,
-- rodrigo


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Rodrigo Moraes  
View profile  
 More options Jan 29 2009, 12:42 am
From: Rodrigo Moraes <rodrigo.mor...@gmail.com>
Date: Thu, 29 Jan 2009 03:42:38 -0200
Local: Thurs, Jan 29 2009 12:42 am
Subject: Re: [google-appengine] Re: Efficient paging using __key__ instead of a dedicated unique property
ooops, sorry, let me correct the result i got. only the third query
doesn't match the one in ryan's doc:

Original:
-----------
WHERE x = B.x AND y = B.y AND __key__ > B ORDER BY __key__ ASC
WHERE x = B.x AND y < B.y ORDER BY y DESC, __key__ ASC
WHERE x > B.x ORDER BY x ASC, y DESC, __key__ ASC

This is what i get:
-----------
WHERE x = B.x AND y = B.y AND __key__ > B ORDER BY __key__ ASC
WHERE x = B.x AND y < B.y ORDER BY y DESC, __key__ ASC
WHERE x > B.x AND y = B.y ORDER BY x ASC, __key__ ASC

-- rodrigo


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
ryan  
View profile  
 More options Jan 29 2009, 12:39 pm
From: ryan <ryanb+appeng...@google.com>
Date: Thu, 29 Jan 2009 09:39:38 -0800 (PST)
Local: Thurs, Jan 29 2009 12:39 pm
Subject: Re: Efficient paging using __key__ instead of a dedicated unique property
wow, awesome. thanks for implementing this, rodrigo! (and sorry for
exploding your head. :P)

On Jan 28, 9:42 pm, Rodrigo Moraes <rodrigo.mor...@gmail.com> wrote:

> ooops, sorry, let me correct the result i got. only the third query
> doesn't match the one in ryan's doc:

> Original:
> -----------
> WHERE x = B.x AND y = B.y AND __key__ > B ORDER BY __key__ ASC
> WHERE x = B.x AND y < B.y ORDER BY y DESC, __key__ ASC
> WHERE x > B.x ORDER BY x ASC, y DESC, __key__ ASC

> This is what i get:
> -----------
> WHERE x = B.x AND y = B.y AND __key__ > B ORDER BY __key__ ASC
> WHERE x = B.x AND y < B.y ORDER BY y DESC, __key__ ASC
> WHERE x > B.x AND y = B.y ORDER BY x ASC, __key__ ASC

this is for an original query of 'ORDER BY x ASC, y DESC', right?

i believe my third derived query, ie without the y = B.y filter, is
right. one way to think about it is that the original query only has
sort orders, not filters, so it should return *every* entity that has
an x property and a y property. say the bookmark is B(x = 1, y = 1).
with your set of derived queries, no entity with x > 1 and y > 1 will
ever be returned, since every query has either an x = 1 filter or a y
= 1 filter. however, if you remove the y = B.y filter from the last
derived query (and add the y DESC sort order), it will include all of
entities with x > 1 and y > 1.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Rodrigo Moraes  
View profile  
 More options Jan 29 2009, 2:19 pm
From: Rodrigo Moraes <rodrigo.mor...@gmail.com>
Date: Thu, 29 Jan 2009 17:19:43 -0200
Local: Thurs, Jan 29 2009 2:19 pm
Subject: Re: [google-appengine] Re: Efficient paging using __key__ instead of a dedicated unique property

On Thu, Jan 29, 2009 at 3:39 PM, ryan wrote:
> i believe my third derived query, ie without the y = B.y filter, is
> right. one way to think about it is that the original query only has
> sort orders, not filters, so it should return *every* entity that has
> an x property and a y property. say the bookmark is B(x = 1, y = 1).
> with your set of derived queries, no entity with x > 1 and y > 1 will
> ever be returned, since every query has either an x = 1 filter or a y
> = 1 filter. however, if you remove the y = B.y filter from the last
> derived query (and add the y DESC sort order), it will include all of
> entities with x > 1 and y > 1.

hi ryan,
hmmm. i see. it seems that this is a special case, and my result was
different because i followed your logic very literally. i'll adapt to
make it work without filters.

i started to make a second try on it, extending datastore.Query
instead. the result is cleaner (but still incomplete):

     http://bitbucket.org/moraes/appengine/src/tip/bookmark.py

i'll post when it is working.

thanks!
-- rodrigo


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Rodrigo Moraes  
View profile  
 More options Jan 30 2009, 8:44 am
From: Rodrigo Moraes <rodrigo.mor...@gmail.com>
Date: Fri, 30 Jan 2009 11:44:16 -0200
Local: Fri, Jan 30 2009 8:44 am
Subject: Re: [google-appengine] Re: Efficient paging using __key__ instead of a dedicated unique property
hey there,
just an update: a working version of the query class with built in
pagination is available at:

    http://bitbucket.org/moraes/appengine/src/tip/bookmark.py

let me know what you think.

-- rodrigo


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Simo Salminen  
View profile  
 More options Mar 17 2009, 3:32 am
From: Simo Salminen <ssalm...@gmail.com>
Date: Tue, 17 Mar 2009 00:32:33 -0700 (PDT)
Local: Tues, Mar 17 2009 3:32 am
Subject: Re: Efficient paging using __key__ instead of a dedicated unique property
Hi.

Just for the curiosity, I did a simple pagination algo which supports
'previous' button. This is not as fancy as rodrigos, because this is
not generalized for all types of queries. Also it is bit repetitive
because I wanted to keep it easy to understand.

Here is the code:
http://pastebin.com/f66e901b1

It assumes following data model:
 class Issue(db.Model):
    whatever = db.StringProperty()
    modified = db.DateTimeProperty()

It does sorting based on 'modified' field. The bookmark is (modified|
__key__|direction), where direction is either 'prev' or 'next'. Note
that because it needs to figure out if there are previous and next
pages, it has to do total of 3 queries per page view. I could not
figure out better way to do this.

On Jan 30, 9:44 pm, Rodrigo Moraes <rodrigo.mor...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »