Efficient paging using __key__ instead of a dedicated unique property

213 views
Skip to first unread message

ryan

unread,
Dec 12, 2008, 7:27:32 PM12/12/08
to Google App Engine
Hi all! There's been a lot of good discussion here about doing paging
efficiently, as opposed to just using offset. We've been discussing
efficient paging internally too. We hope to eventually provide
something built in, but in the short term we have other priorities.
Still, we've come up with some interesting ideas, and I'd like to
share one of them here.

Most of the efficient paging techniques discussed here have been based
on sorting and filtering on a unique property. However, now that we
have __key__ filtering, http://code.google.com/appengine/docs/datastore/queriesandindexes.html#Queries_on_Keys
, we have a built-in unique field that we can filter and sort on. We
can use that to support efficient paging in any query, on any schema,
purely in user land, without requiring extra properties or other
schema changes. (It will sometimes require a new composite index, but
that's less invasive than requiring a new property.)

The executive summary is, when you first run the query, if it doesn't
already have a sort order on __key__, append one. Then, when you run
the query, save the last entity as a bookmark. To pick up from there
later, set up the original query, add filters on all of the sort order
properties so that it starts immediately after the bookmark entity.

The full design is below. We may provide some kind of built-in paging
support in the future, but it won't be this. Given that, I'd love to
see someone pick this up and implement it as an open source project!


Bookmarks
==
A bookmark is the last result entity that was returned to the app for
a given query. That bookmark entity is serialized and base64-encoded
so that apps can pass it around in URL query parameters, cookies, etc.
To keep bookmarks reasonably sized, raw properties and properties not
involved in the query should be stripped from the bookmark entity
before serializing and encoding it.


Queries
==
If you want to be able to resume a query at a bookmark, you need to
know that when you first run the query. That's because you need to
append an ascending __key__ sort order to the query if it's not
already there.

To resume a query at a bookmark, first, decode the bookmark, validate
it, and check that it has all of the properties needed by the query.
Then, use the algorithm below to generate and run a series of derived
queries that transparently resume the query at the bookmarked entity.

Yes, queries plural. If your query has a sort order, you'll need to
run two queries in serial to resume from a bookmark, due to the single
inequality filter property limit. More accurately, if your query has n
sort orders, you'll need to run n + 1 queries.

Also, as mentioned above, making a query bookmarkable sometimes
require extra composite indices that the corresponding non-
bookmarkable query wouldn't require. This is due to the extra
=__key__= sort order.


Resuming from a bookmark
==
Here's Python-like pseudocode for resuming a query at a bookmark. It
generates a series of queries derived from the original query that,
when run in order, return the original query's results starting at the
bookmark. (The indentation is meaningful, so you might want to turn on
a fixed-width font in your viewer here.)

given a query and a bookmark entity:
assert that all of the properties in the query are also in the
bookmark entity

# massage the query. this may add extra sort orders, but won't
remove the
# existing ones, so it doesn't change the query's invariants.
if the query has an inequality filter but no sort order:
add an ASC sort order on the inequailty filter property

if the query doesn't have a sort order on __key__:
append an ASC sort order on __key__

# prepare the original query as a template for the derived queries
if the query has inequality filters:
remove them and record them

for prop in the query's sort orders:
add a new filter to the query: "prop = [bookmark entity's value
for prop]"

remove the query's sort orders, except for __key__, and record
them

# generate the derived queries
make an empty FIFO queue for the derived queues
for prop in reversed(original query's sort orders):
make a deep copy of the original query
on that deep copy:
remove the = filter on prop

replace it with a new filter:
if original sort order on prop was ASC:
operator = ">"
else:
operator = "<"
the new filter is "prop [operator] [bookmark entity's value
for prop]"

also, if there were other inequality filters on prop, add them
back too
also add back the sort order on prop, at the beginning of the
sort orders
enqueue this query

To run the query starting from a bookmark, unwind the queue, ie pop
the derived queries off one at a time, run them, and return their
results. Then each query runs out of results, pop the next, and so on.

Also, note that ancestor is unrelated. If it's specified on the
original query, it should be included in the bookmarkable starting
query and all of the derived queries, but it doesn't otherwise affect
the algorithm otherwise.


Examples
==
(Again, turn on a fixed-width font in your viewer here.)

| *Original starting query* | *Bookmarkable starting
query* | *Derived queries for starting
from bookmark entity B* |

| SELECT * FROM Foo | SELECT * FROM Foo ORDER BY
=__key__= ASC | SELECT * FROM Foo WHERE =__key__=
> B ORDER BY =__key__= ASC |

| WHERE x = 0 | WHERE x = 0 ORDER BY
=__key__= ASC | WHERE x = 0 AND =__key__= >
B ORDER BY =__key__= ASC |

| WHERE x > 0 | WHERE x > 0 ORDER BY x ASC,
=__key__= ASC | WHERE x = B.x AND =__key__= > B
ORDER BY =__key__= ASC |
|
| | WHERE
x > B.x ORDER BY x ASC, =__key__= ASC |

| WHERE x = 0 AND y > 0 | WHERE x = 0 AND y > 0 ORDER
BY y ASC, =__key__= ASC | WHERE x = 0 AND y = B.y AND
=__key__= > B ORDER BY =__key__= ASC |
|
| | WHERE
x = 0 AND y > B.y ORDER BY y ASC, =__key__= ASC |

| WHERE x > 0 AND x < 9 | WHERE x > 0 AND x < 9 ORDER
BY x ASC, =__key__= ASC | WHERE x = B.x AND =__key__= > B
ORDER BY =__key__= ASC |
|
| | WHERE
x > B.x AND x < 9 ORDER BY x ASC, =__key__= ASC |

| WHERE =__key__= > A AND =__key__= < Z | WHERE =__key__= > A AND
=__key__= < Z ORDER BY =__key__= ASC | WHERE =__key__= > B AND
=__key__= < Z ORDER BY =__key__= ASC |

| ORDER BY x ASC | ORDER BY x ASC, =__key__=
ASC | WHERE x = B.x AND =__key__= > B
ORDER BY =__key__= ASC |
|
| | WHERE
x > B.x ORDER BY x ASC, =__key__= ASC |

| ORDER BY x DESC | ORDER BY x DESC, =__key__=
ASC | WHERE x = B.x AND =__key__= > B
ORDER BY =__key__= ASC |
|
| | WHERE
x < B.x ORDER BY x DESC, =__key__= ASC |

| ORDER BY =__key__= ASC | ORDER BY =__key__=
ASC | WHERE =__key__= > B ORDER
BY =__key__= ASC |

| ORDER BY =__key__= DESC | ORDER BY =__key__=
DESC | WHERE =__key__= < B ORDER
BY =__key__= DESC |

| ORDER BY x ASC, y DESC | ORDER BY x ASC, y DESC,
=__key__= ASC | WHERE x = B.x AND y = B.y AND
=__key__= > B ORDER BY =__key__= ASC |
|
| | WHERE
x = B.x AND y < B.y ORDER BY y DESC, =__key__= ASC |
|
| | WHERE
x > B.x ORDER BY x ASC, y DESC, =__key__= ASC |

| ORDER BY x ASC, =__key__= DESC | ORDER BY x ASC, =__key__=
DESC | WHERE x = B.x AND =__key__= < B
ORDER BY =__key__= DESC |
|
| | WHERE
x > B.x ORDER BY x ASC, =__key__= DESC |

| WHERE x = 0 ORDER BY y DESC | WHERE x = 0 ORDER BY y DESC,
=__key__= ASC | WHERE x = 0 AND y = B.y AND
=__key__= > B ORDER BY =__key__= ASC |
|
| | WHERE
x = 0 AND y < B.y ORDER BY y DESC, =__key__= ASC |

| WHERE x > 0 AND x < 9 ORDER BY x DESC | WHERE x > 0 AND x < 9 ORDER
BY x DESC, =__key__= ASC | WHERE x = B.x AND =__key__= > B
ORDER BY =__key__= ASC |
|
| | WHERE
x < B AND x > 0 ORDER BY x DESC, =__key__= ASC |

ryan

unread,
Dec 12, 2008, 7:32:45 PM12/12/08
to Google App Engine
ugh. i don't know why i thought the long lines in the examples table
would survive. i've uploaded the post as a text file, with the long
lines intact:

http://groups.google.com/group/google-appengine/web/efficient_paging_using_key_instead_of_a_dedicated_unique_property.txt

Alexander Kojevnikov

unread,
Dec 12, 2008, 8:18:48 PM12/12/08
to Google App Engine
> Still, we've come up with some interesting ideas, and I'd like to
> share one of them here.
>
Brilliant, thanks Ryan!

Is there a chance that this will be integrated into the Datastore
Viewer?

ryan

unread,
Dec 12, 2008, 8:39:51 PM12/12/08
to Google App Engine
On Dec 12, 5:18 pm, Alexander Kojevnikov <alexan...@kojevnikov.com>
wrote:
>
> Is there a chance that this will be integrated into the Datastore
> Viewer?

unfortunately, no, that's unlikely. if/when we do implement efficient
paging, we'll probably use something else. this is nontrivial to
implement, so i don't expect we'd do both.

Thomas Johansson

unread,
Dec 17, 2008, 1:33:11 AM12/17/08
to Google App Engine
Has anybody taken on the challenge of implementing a library for this?
I'd love to see one, but it's too big a task for me to tackle
currently.

One question springs to mind, however: What about user friendly /
readable / hackable urls? How would you go about implementing e.g. ?
page=10 or ?offset=42 with the entity bookmarks, in a way that doesn't
require a bunch of processing (e.g. fragile) to maintain?

ryan: We really *really* need some kind of pagination built-in though,
it has to be said. It is more or less the number one constraint for
everybody I've talked to, and something we all need in one way or
another. The other constraints can be worked around with relative
ease, but pagination is smack in your face for even the simplest
stuff; And as you clearly demonstrate with the above, not exactly
trivial to implement efficiently, not to mention in a scalable manner.

Thanks,
Thomas

Rodrigo Moraes

unread,
Dec 17, 2008, 5:43:00 AM12/17/08
to google-a...@googlegroups.com
On Wed, Dec 17, 2008 at 4:33 AM, Thomas Johansson wrote:
> ryan: We really *really* need some kind of pagination built-in though,
> it has to be said. It is more or less the number one constraint for
> everybody I've talked to, and something we all need in one way or
> another. The other constraints can be worked around with relative
> ease, but pagination is smack in your face for even the simplest
> stuff; And as you clearly demonstrate with the above, not exactly
> trivial to implement efficiently, not to mention in a scalable manner.

I totally agree with this. #1 constraint for me.

-- rodrigo

ryan

unread,
Dec 17, 2008, 1:40:01 PM12/17/08
to Google App Engine
On Dec 16, 10:33 pm, Thomas Johansson <prenc...@gmail.com> wrote:

> One question springs to mind, however: What about user friendly /
> readable / hackable urls? How would you go about implementing e.g. ?
> page=10 or ?offset=42 with the entity bookmarks, in a way that doesn't
> require a bunch of processing (e.g. fragile) to maintain?

put the bookmark in a cookie instead of a url parameter? or use POST
instead of GET? otherwise, there's probably no way around having a
really large query parameter for the bookmark in your URL. :/ i don't
do much web development, though, so definitely don't take my word as
gospel.

> ryan: We really *really* need some kind of pagination built-in though,

understood. we want it too! we have many more ideas and things we want
to do than resources to do them, but we're working as fast as we can.

Rodrigo Moraes

unread,
Jan 28, 2009, 3:57:03 PM1/28/09
to Google App Engine
a first try is here:

http://paste.pocoo.org/show/101774/

incomplete, as i'm stuck in the FIFO part. anyway, here is the basic
idea for the API:

# get a bookmark - from GET or cookie.
bookmark = self.request.get('bookmark')

# build a query normally
query = db.Query(Foo).filter('x >', 0).filter('x <', 9).order('-
x')

# instead of fetch directly, we call a 'bookmark' method, to tell
the query that we want to
# bookmark the result. we pass a bookmark if one was available, or
None, or nothing.
res = query.bookmark(bookmark).fetch(10)

# now we can get a bookmark from the query, to pass to query
strings or cookies
bookmark = query.bookmarked

When a bookmark is not passed to the bookmark() method, it builds the
bookmarkable query normally. Otherwise, it also (tries to) create the
derived queries.

I'm stuck in the final part of the implementation. If anyone can take
a look and mess around with it, it will be nice. Right now I feel that
my head exploded.

:-)

-- rodrigo

Rodrigo Moraes

unread,
Jan 29, 2009, 12:39:48 AM1/29/09
to Google App Engine
Hey,
I updated the initial version. Now derived queries are generated
properly -- well, most of them work like Ryan described. Some don't.
For example:

Original:
-----------
WHERE x = B.x AND y = B.y AND __key__ > B ORDER BY __key__ ASC
WHERE x = B.x AND y < B.y ORDER BY y DESC, __key__ ASC
WHERE x > B.x ORDER BY x ASC, y DESC, __key__ ASC

This is what i get:
-----------
WHERE x = B.x AND y = B.y AND __key__ > B ORDER BY __key__ ASC
WHERE y < B.x AND x = B.y ORDER BY y DESC, __key__ ASC
WHERE x > B.x AND y = B.y ORDER BY x ASC, __key__ ASC

Maybe the original result is wrong, or maybe mine is? Could you please
make a quick review of the results in the original document?

Here is the code:

http://bitbucket.org/moraes/appengine/src/tip/pager.py

Fetching derived queries will be the next step, after the bookmark /
derived queries are correctly set.

thank you,
-- rodrigo

Rodrigo Moraes

unread,
Jan 29, 2009, 12:42:38 AM1/29/09
to Google App Engine
ooops, sorry, let me correct the result i got. only the third query
doesn't match the one in ryan's doc:

Original:
-----------
WHERE x = B.x AND y = B.y AND __key__ > B ORDER BY __key__ ASC
WHERE x = B.x AND y < B.y ORDER BY y DESC, __key__ ASC
WHERE x > B.x ORDER BY x ASC, y DESC, __key__ ASC

This is what i get:
-----------
WHERE x = B.x AND y = B.y AND __key__ > B ORDER BY __key__ ASC
WHERE x = B.x AND y < B.y ORDER BY y DESC, __key__ ASC

WHERE x > B.x AND y = B.y ORDER BY x ASC, __key__ ASC

-- rodrigo

ryan

unread,
Jan 29, 2009, 12:39:38 PM1/29/09
to Google App Engine
wow, awesome. thanks for implementing this, rodrigo! (and sorry for
exploding your head. :P)
this is for an original query of 'ORDER BY x ASC, y DESC', right?

i believe my third derived query, ie without the y = B.y filter, is
right. one way to think about it is that the original query only has
sort orders, not filters, so it should return *every* entity that has
an x property and a y property. say the bookmark is B(x = 1, y = 1).
with your set of derived queries, no entity with x > 1 and y > 1 will
ever be returned, since every query has either an x = 1 filter or a y
= 1 filter. however, if you remove the y = B.y filter from the last
derived query (and add the y DESC sort order), it will include all of
entities with x > 1 and y > 1.

Rodrigo Moraes

unread,
Jan 29, 2009, 2:19:43 PM1/29/09
to google-a...@googlegroups.com
On Thu, Jan 29, 2009 at 3:39 PM, ryan wrote:
> i believe my third derived query, ie without the y = B.y filter, is
> right. one way to think about it is that the original query only has
> sort orders, not filters, so it should return *every* entity that has
> an x property and a y property. say the bookmark is B(x = 1, y = 1).
> with your set of derived queries, no entity with x > 1 and y > 1 will
> ever be returned, since every query has either an x = 1 filter or a y
> = 1 filter. however, if you remove the y = B.y filter from the last
> derived query (and add the y DESC sort order), it will include all of
> entities with x > 1 and y > 1.

hi ryan,
hmmm. i see. it seems that this is a special case, and my result was
different because i followed your logic very literally. i'll adapt to
make it work without filters.

i started to make a second try on it, extending datastore.Query
instead. the result is cleaner (but still incomplete):

http://bitbucket.org/moraes/appengine/src/tip/bookmark.py

i'll post when it is working.

thanks!
-- rodrigo

Rodrigo Moraes

unread,
Jan 30, 2009, 8:44:16 AM1/30/09
to google-a...@googlegroups.com
hey there,
just an update: a working version of the query class with built in
pagination is available at:

http://bitbucket.org/moraes/appengine/src/tip/bookmark.py

let me know what you think.

-- rodrigo

Simo Salminen

unread,
Mar 17, 2009, 3:32:33 AM3/17/09
to Google App Engine
Hi.

Just for the curiosity, I did a simple pagination algo which supports
'previous' button. This is not as fancy as rodrigos, because this is
not generalized for all types of queries. Also it is bit repetitive
because I wanted to keep it easy to understand.

Here is the code:
http://pastebin.com/f66e901b1

It assumes following data model:
class Issue(db.Model):
whatever = db.StringProperty()
modified = db.DateTimeProperty()

It does sorting based on 'modified' field. The bookmark is (modified|
__key__|direction), where direction is either 'prev' or 'next'. Note
that because it needs to figure out if there are previous and next
pages, it has to do total of 3 queries per page view. I could not
figure out better way to do this.
Reply all
Reply to author
Forward
0 new messages