Exploding Indexes - Update?

39 views
Skip to first unread message

DevShop

unread,
Sep 20, 2010, 10:49:05 AM9/20/10
to Google App Engine
At the Google IO Conference in May, Alfred Fuller talked about his
solution for the exploding indexes problem.

I cannot deploy my app due to an exploding index, just seeding the
database with initial data triggers the problem.

Any updates as to when the solution will be deployed by the App Engine
team?

Ikai Lan (Google)

unread,
Sep 20, 2010, 10:59:58 AM9/20/10
to google-a...@googlegroups.com
In general, we do not release ETAs for features. Alfred's enhancements are coming in multiple small feature releases over the coming months.

It'd be more practical to post what's causing an exploding index, as there are probably ways to design around it and still meet your needs.

--
Ikai Lan 
Developer Programs Engineer, Google App Engine




--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.


DevShop

unread,
Sep 20, 2010, 11:56:39 AM9/20/10
to Google App Engine
Ikai, thanks for the prompt response.

Here's our situation: records on entity 'Report' are tagged with
multiple keywords ('tags') like: 'music', 'event', 'downtown',
'zz_top', 'concert', 'g_arena', 'september', '2010'. People might
search for events downtown in September like this:

city.domain.com/tags/event/september/2010/downtown

So we are using a 'tags = db.ListProperty(basestring, required=True,
validator=are_valid_tags)' in our table with queries like:

reports = db.Query(Report).filter('tags =', subdomain)
for tag in tags:
reports = reports.filter('tags =', tag)
reports = reports.order('-approval_time').fetch(int(25))

Advice and/or insight would be much appreciated!


On Sep 20, 7:59 am, "Ikai Lan (Google)" <ikai.l+gro...@google.com>
wrote:
> In general, we do not release ETAs for features. Alfred's enhancements are
> coming in multiple small feature releases over the coming months.
>
> It'd be more practical to post what's causing an exploding index, as there
> are probably ways to design around it and still meet your needs.
>
> --
> Ikai Lan
> Developer Programs Engineer, Google App Engine
> Blogger:http://googleappengine.blogspot.com
> Reddit:http://www.reddit.com/r/appengine
> Twitter:http://twitter.com/app_engine
>
> On Mon, Sep 20, 2010 at 10:49 AM, DevShop <don.bart...@gmail.com> wrote:
> > At the Google IO Conference in May,  Alfred Fuller talked about his
> > solution for the exploding indexes problem.
>
> > I cannot deploy my app due to an exploding index, just seeding the
> > database with initial data triggers the problem.
>
> > Any updates as to when the solution will be deployed by the App Engine
> > team?
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Google App Engine" group.
> > To post to this group, send email to google-a...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > google-appengi...@googlegroups.com<google-appengine%2Bunsu...@googlegroups.com>
> > .

Ikai Lan (Google)

unread,
Sep 20, 2010, 12:35:14 PM9/20/10
to google-a...@googlegroups.com
Do you have 2 list properties? A single list property won't cause exploding indexes. Can you post the whole property?


--
Ikai Lan 
Developer Programs Engineer, Google App Engine



To unsubscribe from this group, send email to google-appengi...@googlegroups.com.

DevShop

unread,
Sep 20, 2010, 1:08:35 PM9/20/10
to Google App Engine
> Do you have 2 list properties? A single list property won't cause exploding
> indexes. Can you post the whole property?

Thanks for your help!

Do you mean "Can you post the whole class?" ? Here it is, the list
property that explodes is the first property:

class Report(db.Model):

# List of tags for this report, i.e. ['seattle', 'rockroll_rep',
'rock', 'zztop', 'qwest_field'].
tags = db.ListProperty(basestring, required=True,
validator=are_valid_tags)

status = db.StringProperty(required=True, choices = ['PENDING',
'APPROVED', 'DECLINED', 'FLAGGED'], default = 'PENDING')

# If status is PENDING, the reason is detailed here
pending_reason = db.TextProperty(default='')

# If status is FLAGGED, the reason is detailed here
flagged_reason = db.TextProperty(default='')

# The report, first line is the headline.
text = db.TextProperty(default='')

# URL with more info about this report.
url = db.StringProperty(default=None, validator=is_valid_url)

# Editor that approved this report. Could be None if report submitted
from an 'APPROVED' reporter.
editor = db.ReferenceProperty(Reporter,
collection_name='approved_reports', default=None)

# Date/time of insertion of this report into the system.
creation_time = db.DateTimeProperty(auto_now_add=True)

# Date/time of approval of this report, might be the same as
creation_time if it was automatically approved.
approval_time = db.DateTimeProperty(auto_now_add=True)


On Sep 20, 9:35 am, "Ikai Lan (Google)" <ikai.l+gro...@google.com>
wrote:
> Do you have 2 list properties? A single list property won't cause exploding
> indexes. Can you post the whole property?
>
> --
> Ikai Lan
> Developer Programs Engineer, Google App Engine
> Blogger:http://googleappengine.blogspot.com
> Reddit:http://www.reddit.com/r/appengine
> Twitter:http://twitter.com/app_engine
>
> On Mon, Sep 20, 2010 at 11:56 AM, DevShop <don.bart...@gmail.com> wrote:
> > Ikai, thanks for the prompt response.
>
> > Here's our situation: records on entity 'Report' are tagged with
> > multiple keywords ('tags') like: 'music', 'event', 'downtown',
> > 'zz_top', 'concert', 'g_arena', 'september', '2010'. People might
> > search for events downtown in September like this:
>
> > city.domain.com/tags/event/september/2010/downtown
>
> > So we are using a 'tags = db.ListProperty(basestring, required=True,
> > validator=are_valid_tags)' in our table with queries like:
>
> > reports = db.Query(Report).filter('tags =', subdomain)
> > for tag in tags:
> >  reports = reports.filter('tags =', tag)
> > reports = reports.order('-approval_time').fetch(int(25))
>
> > Advice and/or insight would be much appreciated!
>
> > On Sep 20, 7:59 am, "Ikai Lan (Google)" <ikai.l+gro...@google.com<ikai.l%2Bgro...@google.com>
> > <google-appengine%2Bunsu...@googlegroups.com<google-appengine%252Buns...@googlegroups.com>

Alfred Fuller

unread,
Sep 20, 2010, 2:52:14 PM9/20/10
to google-a...@googlegroups.com
Hi,

This is the quintessential exploding index problem that I am fixing. The problem is that we don't currently support zigzag merge join and sorting at the same time. You have several options to address this:
  • Remove the sort order
    • Slightly worse results until you add it back when I add support for zigzag + sort
  • Encode the sort order in the key_name
    • since zigzag results are sorted in key order you can prefix the key_name with a value that will replicate that order (key_names are sorted by lexicographically). This will only work if all the entities in your query either have no parent or the same parent.
  • Restrict the # of tag= in your query
    • If you try to sort today you have a upper bound of 5 tags per query (which in turn restricts you to 5 tags per entity). You can get more tags/entity if you further restrict the # of tags in the query.

 - Alfred

To unsubscribe from this group, send email to google-appengi...@googlegroups.com.

DevShop

unread,
Sep 20, 2010, 3:57:56 PM9/20/10
to Google App Engine
Wow, thanks Alfred, this is good info.

> - Remove the sort order
> - Restrict the # of tag= in your query

OR (choose one) or AND (we have to do both)?

If I just have to remove the sort order from the query and manipulate
the keys value then I can live with that. But if I have to also
restrict the number of tags to 5 then I guess I'll have to wait until
you complete your work since our spec requires up to 10 tags.

Error message when uploading data through the remote API:

google.appengine.api.datastore_errors.BadRequestError: Too many
indexed
properties for entity: app: "sampleapp",path < Element { type:
"Report", id: 6001 }>

This index put it over the limit: entity_type: "Report",ancestor:
false,
Property { name: "tags", direction: ASCENDING},Property { name:
"tags",
direction: ASCENDING},Property { name: "tags", direction:
ASCENDING},
Property { name: "tags", direction: ASCENDING},Property { name:
"approval_time", direction: DESCENDING}


On Sep 20, 11:52 am, Alfred Fuller <arfuller+appeng...@google.com>
wrote:
> Hi,
>
> This is the quintessential exploding index problem that I am fixing. The
> problem is that we don't currently support zigzag merge join and sorting at
> the same time. You have several options to address this:
>
>    - Remove the sort order
>       - Slightly worse results until you add it back when I add support for
>       zigzag + sort
>       - Encode the sort order in the key_name
>       - since zigzag results are sorted in key order you can prefix the
>       key_name with a value that will replicate that order (key_names
> are sorted
>       by lexicographically). This will only work if all the entities
> in your query
>       either have no parent or the same parent.
>    - Restrict the # of tag= in your query
>       - If you try to sort today you have a upper bound of 5 tags per query
>       (which in turn restricts you to 5 tags per entity). You can get more
>       tags/entity if you further restrict the # of tags in the query.
>
>  - Alfred
>
> > On Sep 20, 9:35 am, "Ikai Lan (Google)" <ikai.l+gro...@google.com<ikai.l%2Bgro...@google.com>
> > <ikai.l%2Bgro...@google.com <ikai.l%252Bgro...@google.com>>
> > > > <google-appengine%2Bunsu...@googlegroups.com<google-appengine%252Buns...@googlegroups.com>
> > <google-appengine%252Buns...@googlegroups.com<google-appengine%25252Bun...@googlegroups.com>

Alfred Fuller

unread,
Sep 23, 2010, 2:12:11 PM9/23/10
to google-a...@googlegroups.com
On Mon, Sep 20, 2010 at 12:57 PM, DevShop <don.b...@gmail.com> wrote:
Wow, thanks Alfred, this is good info.

> - Remove the sort order
> - Restrict the # of tag= in your query

OR (choose one) or AND (we have to do both)?


It's OR

Make sure you vacuum the exploding indexes you already have before trying this.

If you decided to live with exploding indexes here is the break down of how to restrict the number of tags= in your query (replace 'attribute' with 'tag' and 'date' with '-approval_time' for you particular case):

Built-indexes = 1 + 2 * N => max 2499 attributes (capped at 1000 for a single list property)
+ Index(attribute, date)  = N => max 1666 attributes (capped at 1000 for a single list property)
+ Index(attribute, attribute, date) = N ^ 2 => max 69 attributes
+ Index(attribute, attribute, attribute, date) = N ^ 3  => max 16 attributes
+ Index(attribute, attribute, attribute, attribute, date) + N ^ 4 => max 8 attributes
+ Index(attribute, attribute, attribute, attribute, attribute, date) + N ^ 5 => max 5 attributes
 
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.

andreas schmid

unread,
Sep 27, 2010, 3:02:51 PM9/27/10
to google-a...@googlegroups.com
hi guys,

im having the same problem with multiple tags and an order in the query.
so you are saying that without order i wont have exploding indexes even if i have 20 keywords im filtering on?
when do you think will you finish your work?

thank you!

> To unsubscribe from this group, send email to google-appengi...@googlegroups.com.

Matija

unread,
Dec 30, 2010, 8:54:17 AM12/30/10
to google-a...@googlegroups.com
Any news about zigzag merge join with sorting and our after Christmas present :D

Matija.

Alfred Fuller

unread,
Jan 12, 2011, 12:39:13 AM1/12/11
to google-a...@googlegroups.com
see http://groups.google.com/group/google-appengine/browse_thread/thread/ab525fcc84939783/b08b303cdfc95d5c

Thanks,

Alfred

On Thu, Dec 30, 2010 at 5:54 AM, Matija <matija....@gmail.com> wrote:
Any news about zigzag merge join with sorting and our after Christmas present :D

Matija.

--
Reply all
Reply to author
Forward
0 new messages