----------
I'm puzzled by the behavior of custom indexes. I have a simple test
case below, a simple equality filter on one property combined with a
descending sort on another property. If I set the properties with
setUnindexedProperty(), the query fails to find the result. If I set
the properties with setProperty(), it does.
Why? I have a custom index, therefore the query should not need or
touch the single-property indexes on these fields, right?
With this requirement, adding a single custom index means the
datastore must now update (at least) five indexes on every put() - the
single-value ASC and DESC indexes of both properties as well as my
custom index. That's gotta hurt.
Here's a test case using the low-level API:
/** */
@Test
public void lowLevelTest() throws Exception
{
DatastoreService service = DatastoreServiceFactory.getDatastoreService();
Entity ent = new Entity("Thing");
ent.setUnindexedProperty("foo", "fooValue");
ent.setUnindexedProperty("bar", 123L);
// switching to this works
//ent.setProperty("foo", "fooValue");
//ent.setProperty("bar", 123L);
service.put(ent);
Query query = new Query("Thing");
query.addFilter("foo", FilterOperator.EQUAL, "fooValue");
query.addSort("bar", SortDirection.DESCENDING);
PreparedQuery pq = service.prepare(query);
int count = 0;
for (Entity fetched: pq.asIterable())
{
count++;
}
assert count == 1;
}
The last assertion fails. The query doesn't find any results. The
automatic datastore index seems to be fine:
<!-- Indices written at Mon, 15 Mar 2010 21:49:01 PDT -->
<datastore-indexes>
<!-- Used 1 time in query history -->
<datastore-index kind="Thing" ancestor="false" source="auto">
<property name="foo" direction="asc"/>
<property name="bar" direction="desc"/>
</datastore-index>
</datastore-indexes>
What's up? Is this just a bug in the dev mode, or is there a real
requirement that all properties must have single-value indexes in
order to be part of a custom index?
Thanks,
Jeff
--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
Every scrap of documentation I've found says that GAE queries only
follow a single index (the one exception being zigzag merges, which
don't apply here). This means that to answer my query.filter(foo,
"fooValue1").sort("-bar"), there must be an index that contains the
foo and bar data sorted appropriately, no? Ie:
/Thing/foo:fooValue1/bar:bar9/[thekeyvalue]
/Thing/foo:fooValue1/bar:bar8/[thekeyvalue]
/Thing/foo:fooValue1/bar:bar7/[thekeyvalue]
/Thing/foo:fooValue2/bar:bar8/[thekeyvalue]
/Thing/foo:fooValue2/bar:bar7/[thekeyvalue]
To satisfy this query, GAE should start following this custom index
and that's pretty much it. There's no reason for it to touch the
single-property indexes (foo ASC, foo DESC, bar ASC, and bar DESC).
...and in my test, if I remove the custom index from
datastore-indexes.xml, it doesn't work. But also if I use
setUnindexedProperty, it doesn't work.
It's like setUnindexedProperty is being interpreted as "not only don't
set single-property indexes, but also don't include the property in
any custom indexes". This is counterintuitive - if I wanted the index
not to be built, I can just remove the index.
Jeff
From one of my followup posts:
Every scrap of documentation I've found says that GAE queries only
follow a single index (the one exception being zigzag merges, which
don't apply here). This means that to answer my query.filter(foo,
"fooValue1").sort("-bar"), there must be an index that contains the
foo and bar data sorted appropriately, no? Ie:
/Thing/foo:fooValue1/bar:bar9/[thekeyvalue]
/Thing/foo:fooValue1/bar:bar8/[thekeyvalue]
/Thing/foo:fooValue1/bar:bar7/[thekeyvalue]
/Thing/foo:fooValue2/bar:bar8/[thekeyvalue]
/Thing/foo:fooValue2/bar:bar7/[thekeyvalue]
To satisfy this query, GAE should start following this custom index
and that's pretty much it. There's no reason for it to touch the
single-property indexes (foo ASC, foo DESC, bar ASC, and bar DESC).
...and in my test, if I remove the custom index from
datastore-indexes.xml, it doesn't work. But also if I use
setUnindexedProperty, it doesn't work.
It's like setUnindexedProperty is being interpreted as "not only don't
set single-property indexes, but also don't include the property in
any custom indexes". This is counterintuitive - if I wanted the index
not to be built, I can just remove the index.
--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
Ah ha. The next question then is - why?
This current arrangement has some particularly nasty consequences:
1) If you want a custom index across 3 properties, you are now forced
to maintain seven indexes total - the three single-property ASC
indexes, the three single-property DESC indexes, and the actual custom
index.
2) If you want to create a custom index across a property that does
not currently have a single-property index, you must manually go
through and reprocess you entire dataset. The automatic index
building isn't automatic.
On the other hand, there doesn't seem to be much benefit to the
current approach. It does provide the ability to make rdbms-like
partial indexes in custom indexes as well as single-property indexes,
but it seems like any minor gain is going to be overshadowed by the
fact that you're now maintaining quite a large number of additional
indexes.
Furthermore, the only way to access this partial index functionality
is with the low-level API. At the higher level APIs, you can only
flag a property as unindexed or not. If the user wants a property to
be excluded from a custom index... they can easily remove it from the
datastore-indexes.xml/yaml. So this default doesn't buy you anything
and actually costs you a lot of flexibility and extra index overhead.
Unless I'm missing something?
Jeff
On Thu, Mar 18, 2010 at 9:32 AM, Nick Johnson (Google)Ah ha. The next question then is - why?
<nick.j...@google.com> wrote:
> Hi Jeff,
> Unindexed properties are excluded from all indexing, not just from indexing
> by the built-in indexes.
> -Nick Johnson
This current arrangement has some particularly nasty consequences:
1) If you want a custom index across 3 properties, you are now forced
to maintain seven indexes total - the three single-property ASC
indexes, the three single-property DESC indexes, and the actual custom
index.
2) If you want to create a custom index across a property that does
not currently have a single-property index, you must manually go
through and reprocess you entire dataset. The automatic index
building isn't automatic.
On the other hand, there doesn't seem to be much benefit to the
current approach. It does provide the ability to make rdbms-like
partial indexes in custom indexes as well as single-property indexes,
but it seems like any minor gain is going to be overshadowed by the
fact that you're now maintaining quite a large number of additional
indexes.
Furthermore, the only way to access this partial index functionality
is with the low-level API. At the higher level APIs, you can only
flag a property as unindexed or not. If the user wants a property to
be excluded from a custom index... they can easily remove it from the
datastore-indexes.xml/yaml. So this default doesn't buy you anything
and actually costs you a lot of flexibility and extra index overhead.
Unless I'm missing something?
Jeff
--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
Sorry, perhaps my terminology is faltering a bit here - by "maintain"
I mean "write the values of". While there are only two index tables,
our hypothetical put() requires three writes to each table, plus
another write to the custom index table. This seems rather painful.
>> 2) If you want to create a custom index across a property that does
>> not currently have a single-property index, you must manually go
>> through and reprocess you entire dataset. The automatic index
>> building isn't automatic.
>
> If you mean that you want to index a field that was previously listed as an
> unindexed field then yes, you do.
I understand that this is how it works, and that it was deliberately
chosen. My question is, why does it work this way?
I would like you to change this behavior: If you call
setUnindexedProperty(), leave the property in custom indexes. Or at
least add a setProperty() method that only creates custom indexes, not
single-property indexes.
The reason for this is that it more accurately reflects how people
actually use the datastore:
1) In a write-heavy application, indexes are expensive to maintain.
Forcing single-property indexes for every custom index is particularly
onerous since custom indexes may contain large numbers of properties.
2) People add custom indexes to enable new features; requiring that
single-property indexes exist means a lot of tedious data reprocessing
whenever you want to add such a feature. You can't rely on automatic
indexing anymore.
In contrast, the current scheme of setUnindexedProperty() skipping
custom indexes doesn't really buy us developers any advantage.
Thanks,
Jeff
The restriction that every index'able property must be indexed at
put-time when put into the datastore (using setProperty as opposed
to setUnindexedProperty), has 2 MAJOR disadvantages with far-reaching
repercussions:
- you cannot index entities after they have been put (even with a
custom index). You will have to re-write the entity to index it.
- Each put is 4X more expensive in latency/clock-time (potentially),
CPU-cost and storage. This seems un-necessary, especially if a single
custom index will suffice all your querying needs.
I was really hoping to use custom indexes to buy major performance
savings (in latency, CPU-cost and storage), and also be able to re-
index every entity after the fact. But this restriction is heavy
(preventing direct solution to a problem everyone may end up facing),
and very expensive (costing us big-time).
A way to disable automatic single-property indexes for certain
properties or entities will help to solve this issue.
On Mar 18, 10:36 am, Jeff Schnitzer <j...@infohazard.org> wrote:
> On Thu, Mar 18, 2010 at 9:55 AM, Nick Johnson (Google)
>
> <nick.john...@google.com> wrote:
It's like setUnindexedProperty is being interpreted as "not only don't
set single-property indexes, but also don't include the property in
any custom indexes".
That's correct [...] any unindexed properties are ignored for the purposes of all indexing, not just for built-in indexes.