Why do custom indexes require single-property indexes?

36 views
Skip to first unread message

Jeff Schnitzer

unread,
Mar 18, 2010, 12:27:17 PM3/18/10
to Google App Engine
For those of you reading this thread on appengine-java, I apologize.
However, I realized that this is really a general appengine question
and might get a better response from someone "in the know" here.
Here's the appengine-java thread:
http://groups.google.com/group/google-appengine-java/browse_thread/thread/efed35cabf60f6ee

----------

I'm puzzled by the behavior of custom indexes. I have a simple test
case below, a simple equality filter on one property combined with a
descending sort on another property. If I set the properties with
setUnindexedProperty(), the query fails to find the result. If I set
the properties with setProperty(), it does.

Why? I have a custom index, therefore the query should not need or
touch the single-property indexes on these fields, right?

With this requirement, adding a single custom index means the
datastore must now update (at least) five indexes on every put() - the
single-value ASC and DESC indexes of both properties as well as my
custom index. That's gotta hurt.

Here's a test case using the low-level API:

/** */
@Test
public void lowLevelTest() throws Exception
{
DatastoreService service = DatastoreServiceFactory.getDatastoreService();

Entity ent = new Entity("Thing");
ent.setUnindexedProperty("foo", "fooValue");
ent.setUnindexedProperty("bar", 123L);
// switching to this works
//ent.setProperty("foo", "fooValue");
//ent.setProperty("bar", 123L);
service.put(ent);

Query query = new Query("Thing");
query.addFilter("foo", FilterOperator.EQUAL, "fooValue");
query.addSort("bar", SortDirection.DESCENDING);

PreparedQuery pq = service.prepare(query);
int count = 0;
for (Entity fetched: pq.asIterable())
{
count++;
}

assert count == 1;
}

The last assertion fails. The query doesn't find any results. The
automatic datastore index seems to be fine:

<!-- Indices written at Mon, 15 Mar 2010 21:49:01 PDT -->
<datastore-indexes>
<!-- Used 1 time in query history -->
<datastore-index kind="Thing" ancestor="false" source="auto">
<property name="foo" direction="asc"/>
<property name="bar" direction="desc"/>
</datastore-index>
</datastore-indexes>

What's up? Is this just a bug in the dev mode, or is there a real
requirement that all properties must have single-value indexes in
order to be part of a custom index?

Thanks,
Jeff

Nick Johnson (Google)

unread,
Mar 18, 2010, 12:32:09 PM3/18/10
to google-a...@googlegroups.com
Hi Jeff,

Unindexed properties are excluded from all indexing, not just from indexing by the built-in indexes.

-Nick Johnson


--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.




--
Nick Johnson, Developer Programs Engineer, App Engine
Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: 368047

Jeff Schnitzer

unread,
Mar 18, 2010, 12:32:49 PM3/18/10
to Google App Engine
From one of my followup posts:

Every scrap of documentation I've found says that GAE queries only
follow a single index (the one exception being zigzag merges, which
don't apply here). This means that to answer my query.filter(foo,
"fooValue1").sort("-bar"), there must be an index that contains the
foo and bar data sorted appropriately, no? Ie:

/Thing/foo:fooValue1/bar:bar9/[thekeyvalue]
/Thing/foo:fooValue1/bar:bar8/[thekeyvalue]
/Thing/foo:fooValue1/bar:bar7/[thekeyvalue]
/Thing/foo:fooValue2/bar:bar8/[thekeyvalue]
/Thing/foo:fooValue2/bar:bar7/[thekeyvalue]

To satisfy this query, GAE should start following this custom index
and that's pretty much it. There's no reason for it to touch the
single-property indexes (foo ASC, foo DESC, bar ASC, and bar DESC).

...and in my test, if I remove the custom index from
datastore-indexes.xml, it doesn't work. But also if I use
setUnindexedProperty, it doesn't work.

It's like setUnindexedProperty is being interpreted as "not only don't
set single-property indexes, but also don't include the property in
any custom indexes". This is counterintuitive - if I wanted the index
not to be built, I can just remove the index.

Jeff

Nick Johnson (Google)

unread,
Mar 18, 2010, 12:36:36 PM3/18/10
to google-a...@googlegroups.com
On Thu, Mar 18, 2010 at 4:32 PM, Jeff Schnitzer <je...@infohazard.org> wrote:
From one of my followup posts:

Every scrap of documentation I've found says that GAE queries only
follow a single index (the one exception being zigzag merges, which
don't apply here).  This means that to answer my query.filter(foo,
"fooValue1").sort("-bar"), there must be an index that contains the
foo and bar data sorted appropriately, no?  Ie:

/Thing/foo:fooValue1/bar:bar9/[thekeyvalue]
/Thing/foo:fooValue1/bar:bar8/[thekeyvalue]
/Thing/foo:fooValue1/bar:bar7/[thekeyvalue]
/Thing/foo:fooValue2/bar:bar8/[thekeyvalue]
/Thing/foo:fooValue2/bar:bar7/[thekeyvalue]

To satisfy this query, GAE should start following this custom index
and that's pretty much it.  There's no reason for it to touch the
single-property indexes (foo ASC, foo DESC, bar ASC, and bar DESC).

...and in my test, if I remove the custom index from
datastore-indexes.xml, it doesn't work.  But also if I use
setUnindexedProperty, it doesn't work.

It's like setUnindexedProperty is being interpreted as "not only don't
set single-property indexes, but also don't include the property in
any custom indexes".  This is counterintuitive - if I wanted the index
not to be built, I can just remove the index.

That's correct. Entities are evaluated individually, not as a group, and any unindexed properties are ignored for the purposes of all indexing, not just for built-in indexes.

-Nick Johnson
 
--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Jeff Schnitzer

unread,
Mar 18, 2010, 12:49:11 PM3/18/10
to google-a...@googlegroups.com
On Thu, Mar 18, 2010 at 9:32 AM, Nick Johnson (Google)
<nick.j...@google.com> wrote:
> Hi Jeff,
> Unindexed properties are excluded from all indexing, not just from indexing
> by the built-in indexes.
> -Nick Johnson

Ah ha. The next question then is - why?

This current arrangement has some particularly nasty consequences:

1) If you want a custom index across 3 properties, you are now forced
to maintain seven indexes total - the three single-property ASC
indexes, the three single-property DESC indexes, and the actual custom
index.

2) If you want to create a custom index across a property that does
not currently have a single-property index, you must manually go
through and reprocess you entire dataset. The automatic index
building isn't automatic.

On the other hand, there doesn't seem to be much benefit to the
current approach. It does provide the ability to make rdbms-like
partial indexes in custom indexes as well as single-property indexes,
but it seems like any minor gain is going to be overshadowed by the
fact that you're now maintaining quite a large number of additional
indexes.

Furthermore, the only way to access this partial index functionality
is with the low-level API. At the higher level APIs, you can only
flag a property as unindexed or not. If the user wants a property to
be excluded from a custom index... they can easily remove it from the
datastore-indexes.xml/yaml. So this default doesn't buy you anything
and actually costs you a lot of flexibility and extra index overhead.

Unless I'm missing something?

Jeff

Nick Johnson (Google)

unread,
Mar 18, 2010, 12:55:23 PM3/18/10
to google-a...@googlegroups.com
Hi Jeff,

On Thu, Mar 18, 2010 at 4:49 PM, Jeff Schnitzer <je...@infohazard.org> wrote:
On Thu, Mar 18, 2010 at 9:32 AM, Nick Johnson (Google)
<nick.j...@google.com> wrote:
> Hi Jeff,
> Unindexed properties are excluded from all indexing, not just from indexing
> by the built-in indexes.
> -Nick Johnson

Ah ha.  The next question then is - why?

This current arrangement has some particularly nasty consequences:

1) If you want a custom index across 3 properties, you are now forced
to maintain seven indexes total - the three single-property ASC
indexes, the three single-property DESC indexes, and the actual custom
index.

None of the built in indexes require 'maintenance'. There's also only a single desc, and a single asc index for all properties.
 

2) If you want to create a custom index across a property that does
not currently have a single-property index, you must manually go
through and reprocess you entire dataset.  The automatic index
building isn't automatic.

If you mean that you want to index a field that was previously listed as an unindexed field then yes, you do.

-Nick Johnson


On the other hand, there doesn't seem to be much benefit to the
current approach.  It does provide the ability to make rdbms-like
partial indexes in custom indexes as well as single-property indexes,
but it seems like any minor gain is going to be overshadowed by the
fact that you're now maintaining quite a large number of additional
indexes.

Furthermore, the only way to access this partial index functionality
is with the low-level API.  At the higher level APIs, you can only
flag a property as unindexed or not.  If the user wants a property to
be excluded from a custom index... they can easily remove it from the
datastore-indexes.xml/yaml.  So this default doesn't buy you anything
and actually costs you a lot of flexibility and extra index overhead.

Unless I'm missing something?

Jeff

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Jeff Schnitzer

unread,
Mar 18, 2010, 1:36:24 PM3/18/10
to google-a...@googlegroups.com
On Thu, Mar 18, 2010 at 9:55 AM, Nick Johnson (Google)
<nick.j...@google.com> wrote:
> On Thu, Mar 18, 2010 at 4:49 PM, Jeff Schnitzer <je...@infohazard.org> wrote:
>>
>> This current arrangement has some particularly nasty consequences:
>>
>> 1) If you want a custom index across 3 properties, you are now forced
>> to maintain seven indexes total - the three single-property ASC
>> indexes, the three single-property DESC indexes, and the actual custom
>> index.
>
> None of the built in indexes require 'maintenance'. There's also only a
> single desc, and a single asc index for all properties.

Sorry, perhaps my terminology is faltering a bit here - by "maintain"
I mean "write the values of". While there are only two index tables,
our hypothetical put() requires three writes to each table, plus
another write to the custom index table. This seems rather painful.

>> 2) If you want to create a custom index across a property that does
>> not currently have a single-property index, you must manually go
>> through and reprocess you entire dataset.  The automatic index
>> building isn't automatic.
>
> If you mean that you want to index a field that was previously listed as an
> unindexed field then yes, you do.

I understand that this is how it works, and that it was deliberately
chosen. My question is, why does it work this way?

I would like you to change this behavior: If you call
setUnindexedProperty(), leave the property in custom indexes. Or at
least add a setProperty() method that only creates custom indexes, not
single-property indexes.

The reason for this is that it more accurately reflects how people
actually use the datastore:

1) In a write-heavy application, indexes are expensive to maintain.
Forcing single-property indexes for every custom index is particularly
onerous since custom indexes may contain large numbers of properties.

2) People add custom indexes to enable new features; requiring that
single-property indexes exist means a lot of tedious data reprocessing
whenever you want to add such a feature. You can't rely on automatic
indexing anymore.

In contrast, the current scheme of setUnindexedProperty() skipping
custom indexes doesn't really buy us developers any advantage.

Thanks,
Jeff

Ugorji Nwoke

unread,
Mar 18, 2010, 3:19:37 PM3/18/10
to Google App Engine
Cross-posting my comments from the earlier thread.

The restriction that every index'able property must be indexed at
put-time when put into the datastore (using setProperty as opposed
to setUnindexedProperty), has 2 MAJOR disadvantages with far-reaching
repercussions:
- you cannot index entities after they have been put (even with a
custom index). You will have to re-write the entity to index it.
- Each put is 4X more expensive in latency/clock-time (potentially),
CPU-cost and storage. This seems un-necessary, especially if a single
custom index will suffice all your querying needs.

I was really hoping to use custom indexes to buy major performance
savings (in latency, CPU-cost and storage), and also be able to re-
index every entity after the fact. But this restriction is heavy
(preventing direct solution to a problem everyone may end up facing),
and very expensive (costing us big-time).

A way to disable automatic single-property indexes for certain
properties or entities will help to solve this issue.

On Mar 18, 10:36 am, Jeff Schnitzer <j...@infohazard.org> wrote:
> On Thu, Mar 18, 2010 at 9:55 AM, Nick Johnson (Google)
>

> <nick.john...@google.com> wrote:

Harshad RJ

unread,
Jan 6, 2012, 2:28:22 AM1/6/12
to google-a...@googlegroups.com
It's like setUnindexedProperty is being interpreted as "not only don't
set single-property indexes, but also don't include the property in
any custom indexes".

That's correct [...] any unindexed properties are ignored for the purposes of all indexing, not just for built-in indexes.

Is there a chance of this changing with an API upgrade or is it some design constraint in the data-store that will make it near impossible to change in the near future?

(Apologies to those who don't like old threads being bumped.)
Reply all
Reply to author
Forward
0 new messages