Query with 'IN' and Associated Index Explosion

5 views
Skip to first unread message

Devel63

unread,
Feb 28, 2009, 7:55:47 PM2/28/09
to Google App Engine
I understand that if you issue a query using "IN", the datastore
actually issues as many sub-queries as there are values listed in the
IN set, up to a max of 30. It's not the greatest, but I get it.

However, when I do this, the dev_server creates a bunch of index
definitions, so that I have one with 1 property if there's 1 value in
the IN set, 15 index properties if there are 15 values in the IN set,
and so on.

This doesn't seem to be scalable, especially if there are other
variations of property conditions in the mix. Should I just do the
looping outside the query, and ask for an equality with each value in
the IN set? Are these index definitions really indicative of what
will need to happen on the server?

Andy Freeman

unread,
Mar 1, 2009, 5:13:19 PM3/1/09
to Google App Engine
> However, when I do this, the dev_server creates a bunch of index
> definitions, so that I have one with 1 property if there's 1 value in
> the IN set, 15 index properties if there are 15 values in the IN set,
> and so on.

Are all of these index properties necessary in production? Will a
production query with an IN set with 4 members fail if the test set
used to generate index.yaml has IN sets with 1, 3, and 5 members?

ryan

unread,
Mar 4, 2009, 2:29:18 AM3/4/09
to Google App Engine
On Mar 1, 2:13 pm, Andy Freeman <ana...@earthlink.net> wrote:
> > However, when I do this, the dev_server creates a bunch of index
> > definitions, so that I have one with 1 property if there's 1 value in
> > the IN set, 15 index properties if there are 15 values in the IN set,
> > and so on.
>
> Are all of these index properties necessary in production?  Will a
> production query with an IN set with 4 members fail if the test set
> used to generate index.yaml has IN sets with 1, 3, and 5 members?

i have to admit, i don't entirely understand the problem. indices are
only needed with different combinations of property names, not filter
values. IN queries fan out to multiple queries on the exact same
combination of property names; only the filter values change. changing
the number of filter values shouldn't change the indices needed by the
subqueries. for example, SELECT * FROM Foo WHERE x in [0, 1] fans out
to:

SELECT * FROM Foo WHERE x = 0
SELECT * FROM Foo WHERE x = 1

while SELECT * FROM Foo WHERE x in [0, 1, 2, 3] fans out to

SELECT * FROM Foo WHERE x = 0
SELECT * FROM Foo WHERE x = 1
SELECT * FROM Foo WHERE x = 2
SELECT * FROM Foo WHERE x = 3

all of the subqueries use the same index. here, it's a built-in single
property index, but it could be a composite index if there was an
additional filter or sort order.

given that, i suspect there's more going on here, or at least a
substantially more complicated query. Devel63, any chance you could
post your exact query and filter values, ie the values for the IN set?

Devel63

unread,
Mar 4, 2009, 12:11:17 PM3/4/09
to Google App Engine
It is indeed a more complex query, and the IN values are keys. At one
point, I had tried to make them key values in a named reference
collection, but while the data store never complained, it never
returned any results.

Can you confirm that it is impossible to make an entity query that
filters by normal prop vals and by a list of keys in a collection?
e.g. Entity.gql('WHERE prop1 = :val1 AND named_collection
IN :key_list', val1=val, key_list=keys)

Separately, spurred on by your comments, I cleared the datastore and
started again. Now, the index explosion is not occurring. I guess
that some interim query experiment requested such indexes, and
stopping the datastore and deleting the indexes out of the index.yaml
file was not sufficient to clear them, so they kept reappearing in
index.yaml.

ryan

unread,
Mar 4, 2009, 3:46:57 PM3/4/09
to Google App Engine
On Mar 4, 9:11 am, Devel63 <DanStic...@gmail.com> wrote:
> It is indeed a more complex query, and the IN values are keys.  At one
> point, I had tried to make them key values in a named reference
> collection, but while the data store never complained, it never
> returned any results.

understood. happily, for the purposes of this conversation, the type
of the IN values actually doesn't matter. IN won't generate an index
per list length regardless of the IN values' type.

> Can you confirm that it is impossible to make an entity query that
> filters by normal prop vals and by a list of keys in a collection?
> e.g. Entity.gql('WHERE prop1 = :val1 AND named_collection
> IN :key_list', val1=val, key_list=keys)

no, that's definitely possible. from http://shell.appspot.com/ :

>>> class Foo(db.Expando):
pass
>>> Foo.gql('WHERE x = 0 AND y IN (1, 2)').fetch(3)
[<__main__.Foo object at 0xf5ca9957dc3a9c30>, <__main__.Foo object at
0xf5ca9957dc3a9fb0>, <__main__.Foo object at 0xf5ca9957dc3a9cb0>]

again, this is true regardless of the type of the filter values.

> Separately, spurred on by your comments, I cleared the datastore and
> started again.  Now, the index explosion is not occurring.  I guess
> that some interim query experiment requested such indexes, and
> stopping the datastore and deleting the indexes out of the index.yaml
> file was not sufficient to clear them, so they kept reappearing in
> index.yaml.

hmm. the dev_appserver stores both the datastore and the query
history, in separate files. i wonder if the query history file somehow
didn't get deleted. (it defaults to /tmp/
dev_appserver.datastore.history.) more in the docs:

http://code.google.com/appengine/docs/python/tools/devserver.html#Command_Line_Arguments

Devel63

unread,
Mar 4, 2009, 6:34:14 PM3/4/09
to Google App Engine
This is very promising! Are you saying that this reference collection
query should work (note the named collection)?

class Foo(db.Model):
prop1 = db.IntegerProperty()

class Fee(db.Model):
my_foo = db.ReferenceProperty(Foo, collection_name='my_collection')
prop2 = db.IntegerProperty()

foos = Foo.all().filter('prop1 =', 32)
results = Fee.gql('WHERE prop2 = 18 AND collection_name IN :1', foos)

That would be awesome, and I'd just have to try a little harder to get
things to work. Of course in my real example, I'm using db.Searchable
instead of Model, and using search, but I'd be happy just to get the
above to work.


On Mar 4, 12:46 pm, ryan <ryanb+appeng...@google.com> wrote:
> On Mar 4, 9:11 am, Devel63 <DanStic...@gmail.com> wrote:
>
> > It is indeed a more complex query, and the IN values are keys.  At one
> > point, I had tried to make them key values in a named reference
> > collection, but while the data store never complained, it never
> > returned any results.
>
> understood. happily, for the purposes of this conversation, the type
> of the IN values actually doesn't matter. IN won't generate an index
> per list length regardless of the IN values' type.
>
> > Can you confirm that it is impossible to make an entity query that
> > filters by normal prop vals and by a list of keys in a collection?
> > e.g. Entity.gql('WHERE prop1 = :val1 AND named_collection
> > IN :key_list', val1=val, key_list=keys)
>
> no, that's definitely possible. fromhttp://shell.appspot.com/:
>
> >>> class Foo(db.Expando):
>   pass
> >>> Foo.gql('WHERE x = 0 AND y IN (1, 2)').fetch(3)
>
> [<__main__.Foo object at 0xf5ca9957dc3a9c30>, <__main__.Foo object at
> 0xf5ca9957dc3a9fb0>, <__main__.Foo object at 0xf5ca9957dc3a9cb0>]
>
> again, this is true regardless of the type of the filter values.
>
> > Separately, spurred on by your comments, I cleared the datastore and
> > started again.  Now, the index explosion is not occurring.  I guess
> > that some interim query experiment requested such indexes, and
> > stopping the datastore and deleting the indexes out of the index.yaml
> > file was not sufficient to clear them, so they kept reappearing in
> > index.yaml.
>
> hmm. the dev_appserver stores both the datastore and the query
> history, in separate files. i wonder if the query history file somehow
> didn't get deleted. (it defaults to /tmp/
> dev_appserver.datastore.history.) more in the docs:
>
> http://code.google.com/appengine/docs/python/tools/devserver.html#Com...
Reply all
Reply to author
Forward
0 new messages