Prefix search with GQL

121 views
Skip to first unread message

dan

unread,
Jan 2, 2010, 4:41:21 PM1/2/10
to gae-search
I'm looking into using gae-search premium.

The docs say to use a "startswith" indexer for prefix search. There is
another way in GQL: an entity whose X property is ['apples',
'apricots', 'bananas'] will match X >= 'apr' and X < 'apr\ufffd'
because one value starts with "apr". The entity matches if at least
one property matches *all* of the inequality filters in the query, so
it works, although not in development (http://code.google.com/p/
googleappengine/issues/detail?id=798).

If I look in the gae-search code to patch in the GQL way (instead of
the startswith indexer), is it easy, or brutally hard?

Dan

twan...@googlemail.com

unread,
Jan 3, 2010, 11:07:30 AM1/3/10
to gae-search

First of all as this solution works well for maybe one search term but
will result in an exploding index when searching with multiple search
terms because of the inequality filters used (see
http://code.google.com/intl/de-DE/appengine/docs/python/datastore/queriesandindexes.html#Big_Entities_and_Exploding_Indexes).
Thus we implemented a custom startswith indexer and advise to use this
indexer. So it basically is not a good idea to use the method
described above as long as you know that your search will use multiple
search terms.
I hope this will answers your question.

Thomas

Dan Frankowski

unread,
Jan 3, 2010, 12:27:03 PM1/3/10
to gae-search
Thomas,

Thanks for your reply.

Could you explain what the query looks like that causes the index to explode, and for which data? One possible strategy is to take the union of "query for word 1", "query for word 2", etc. I don't think that explodes. I'm sure you have a more sophisticated thing in mind.

Also, the 'average' length of Google query is 4 words (http://www.beussery.com/blog/index.php/2008/02/google-average-number-of-words-per-query-have-increased). That is perhaps not so bad if the fields being indexed have a low number of words (e.g., product models, name and city of a person, etc.).

Finally, I don't see exactly how the startswith avoids the explosion. I'm sure you're right, I just don't see it yet.

Dan



--

You received this message because you are subscribed to the Google Groups "gae-search" group.
To post to this group, send email to gae-s...@googlegroups.com.
To unsubscribe from this group, send email to gae-search+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/gae-search?hl=en.



twan...@googlemail.com

unread,
Jan 3, 2010, 3:49:53 PM1/3/10
to gae-search

On Jan 3, 6:27 pm, Dan Frankowski <dfran...@gmail.com> wrote:
> Thomas,
>
> Thanks for your reply.
>
> Could you explain what the query looks like that causes the index to
> explode, and for which data? One possible strategy is to take the union of
> "query for word 1", "query for word 2", etc. I don't think that explodes.
> I'm sure you have a more sophisticated thing in mind.

See the example below to understand which queries will cause such an
explotion
You can take the union of those queries but you have no guarantee that
let's say the first query and the second query or in general that any
of these queries will have search results in common. Thus the union
can produce no results at all (an empty result list).

> Also, the 'average' length of Google query is 4 words (http://www.beussery.com/blog/index.php/2008/02/google-average-number-...).


> That is perhaps not so bad if the fields being indexed have a low number of
> words (e.g., product models, name and city of a person, etc.).

Let's assume your model has 7 words (maybe first name, last name, ...)
which you query using inequality filters with an average of 4 search
terms in order to implement startswith. You need to store each
possible permutation of the models words for such queries in an index.
This will result in 7^4 + 7^2 +7^3+7^4 = 2 800 index entries for one
entity (each summand corresponds to a querie using 4, 3, 2 or 1 search
term and you would like to allow your users to search for 2 or 3 words
if you allow them to search for 4 words). Already increasing the
length of the search terms to 5 will result in an exploding index
because now you would need to have 7^5 + 7^4 + 7^2 + 7^3 + 7 = 19 607
index entries for one entity and the maximum is 5000 index entries per
entity. Even increasing the number of words to only 9 (and i think of
9 words as a low number of words) will result in an exploding index
too.

> Finally, I don't see exactly how the startswith avoids the explosion. I'm
> sure you're right, I just don't see it yet.

Our startswith index avoids an exploding index because it does not
need a composite index at all. We do not use inequality filters in
order to implement startswith thus we do not suffer from this problem.

I hope this helps you.

Bye,
Thomas

> Dan
>
> On Sun, Jan 3, 2010 at 10:07 AM, twansc...@googlemail.com <


>
> twansc...@googlemail.com> wrote:
>
> > On Jan 2, 10:41 pm, dan <dfran...@gmail.com> wrote:
> > > I'm looking into using gae-search premium.
>
> > > The docs say to use a "startswith" indexer for prefix search. There is
> > > another way in GQL: an entity whose X property is ['apples',
> > > 'apricots', 'bananas'] will match X >= 'apr' and X < 'apr\ufffd'
> > > because one value starts with "apr". The entity matches if at least
> > > one property matches *all* of the inequality filters in the query, so
> > > it works, although not in development (http://code.google.com/p/
> > > googleappengine/issues/detail?id=798).
>
> > > If I look in the gae-search code to patch in the GQL way (instead of
> > > the startswith indexer), is it easy, or brutally hard?
>
> > First of all as this solution works well for maybe one search term but
> > will result in an exploding index when searching with multiple search
> > terms because of the inequality filters used (see
>

> >http://code.google.com/intl/de-DE/appengine/docs/python/datastore/que...


> > ).
> > Thus we implemented a custom startswith indexer and advise to use this
> > indexer. So it basically is not a good idea to use the method
> > described above as long as you know that your search will use multiple
> > search terms.
> > I hope this will answers your question.
>
> > Thomas
>
> > --
>
> > You received this message because you are subscribed to the Google Groups
> > "gae-search" group.
> > To post to this group, send email to gae-s...@googlegroups.com.
> > To unsubscribe from this group, send email to

> > gae-search+...@googlegroups.com<gae-search%2Bunsu...@googlegroups.com>

Reply all
Reply to author
Forward
0 new messages