Executing pattern matching queries

1,926 views
Skip to first unread message

Amit Tripathy

unread,
Mar 24, 2010, 4:25:43 PM3/24/10
to objectify...@googlegroups.com
Hi All,

Is it possible to execute pattern matching query using objectify.

For instance lets say I have an entity like

Class Dish

@Id
String name
String description

and I want to list out all Dish where name like 'pas%'

is it possible...

Thanks,
Amit

Christian Goudreau

unread,
Mar 24, 2010, 4:29:54 PM3/24/10
to objectify...@googlegroups.com
I don't think it's possible. I've switched to twig-persist for that reason.

But you can always do the pattern matching yourself with the full list of object. I juste don't think that full text search is compatible with AppEngine datastore, but I can be wrong ! If someone can help us, I'll be happy to hear some real explanation :)

Christian

To unsubscribe from this group, send email to objectify-appengine+unsubscribegooglegroups.com or reply to this email with the words "REMOVE ME" as the subject.

Jeff Schnitzer

unread,
Mar 24, 2010, 4:49:28 PM3/24/10
to objectify...@googlegroups.com
It turns out you can do this kind of query pretty easily as long as
the % is at the end of the string. [name like 'pas%'] is just a range
query:

Iterable<Dish> dishes = ofy.query(Dish.class).filter("name >=",
"pas").filter("name <", "pat");

Note that you can index [name like '%pas'] by storing the reversed
string as a separate field and using a range scan as above.

On the other hand, [name like 'pas%ies'] cannot be done using without
a full table scan on any system, not just appengine. Even Lucene
won't help unless you're willing to index every combination of letters
in your input data.

Christian, what is it that you couldn't do with Objectify that made
you want to use Twig? Not that there's anything wrong with that, but
we're always interested in hearing why someone would switch.

Jeff

Jeff Schnitzer

unread,
Mar 24, 2010, 4:55:41 PM3/24/10
to objectify...@googlegroups.com
On Wed, Mar 24, 2010 at 1:49 PM, Jeff Schnitzer <je...@infohazard.org> wrote:
>
> On the other hand, [name like 'pas%ies'] cannot be done using without
> a full table scan on any system, not just appengine.  Even Lucene
> won't help unless you're willing to index every combination of letters
> in your input data.

I should correct myself slightly here: [name like 'pas%ies'] doesn't
require a *full* table scan, you can use the first part (or the last
part) to limit the scan.

Jeff

Matt Quail

unread,
Mar 24, 2010, 6:41:24 PM3/24/10
to objectify...@googlegroups.com
> Iterable<Dish> dishes = ofy.query(Dish.class).filter("name >=",
> "pas").filter("name <", "pat");

I'd suggest one small stylistic tweak to Jeff's query, and append
\uFFFD instead of incrementing "pas" to "pat", eg

Iterable<Dish> dishes = ofy.query(Dish.class).filter("name >=",

"pas").filter("name <", "pas\uFFFD");

FFFD is the largest possible unicode character[1]. Any name that
starts with "pas" will be lexicographically greater-than-or-equal-to
"pas" and strictly less than "pas\uFFFFD". Jeff's example works,
because it will also be strictly less than "pat", but "pas"+"\uFFFD"
is easy to compute than increment("pas").

Actually, I suppose increment("pas") is not that hard to compute.

=Matt

[1] There are code points greater than FFFD, but iirc because of
either UTF-16 encoding and/or surrogate-pair usage, appending FFFD
always does the trick.

Amit Tripathy

unread,
Mar 24, 2010, 7:16:15 PM3/24/10
to objectify...@googlegroups.com, objectify...@googlegroups.com
This works.

Thanks Matt.

Sent from my iPhone

Christian Goudreau

unread,
Mar 24, 2010, 8:50:52 PM3/24/10
to objectify...@googlegroups.com
Actually it was because I searched an answer for something like this and found a post where it said that it wasn't possible. But I probably misunderstood.

Anyway, I should've stayed quiet and wait for some concrete answers lol

Christian

To unsubscribe from this group, send email to objectify-appengine+unsubscribegooglegroups.com or reply to this email with the words "REMOVE ME" as the subject.


To unsubscribe from this group, send email to objectify-appengine+unsubscribegooglegroups.com or reply to this email with the words "REMOVE ME" as the subject.

To unsubscribe from this group, send email to objectify-appengine+unsubscribegooglegroups.com or reply to this email with the words "REMOVE ME" as the subject.

Pelle Poluha

unread,
Apr 1, 2010, 6:08:31 PM4/1/10
to objectify-appengine
But what if you want to seach for an object that contains a string, ie
thq query "select * from reason where description like '%basic%'". Is
it possible? I guess it's a very common use case.

And thanks for your work with objectify. Found it yesterday and it was
exactly what I was looking for.

/Pelle

On 24 mar, 22:49, Jeff Schnitzer <j...@infohazard.org> wrote:
> It turns out you can do this kind ofquerypretty easily as long as


> the % is at the end of the string.  [namelike'pas%'] is just a rangequery:
>
> Iterable<Dish> dishes = ofy.query(Dish.class).filter("name >=",
> "pas").filter("name <", "pat");
>

> Note that you can index [namelike'%pas'] by storing the reversed


> string as a separate field and using a range scan as above.
>

> On the other hand, [namelike'pas%ies'] cannot be done using without


> a full table scan on any system, not just appengine.  Even Lucene
> won't help unless you're willing to index every combination of letters
> in your input data.
>
> Christian, what is it that you couldn't do with Objectify that made
> you want to use Twig?  Not that there's anything wrong with that, but
> we're always interested in hearing why someone would switch.
>
> Jeff
>
> On Wed, Mar 24, 2010 at 1:29 PM, Christian Goudreau
>

> <goudreau.christ...@gmail.com> wrote:
> > I don't think it's possible. I've switched to twig-persist for that reason.
> > But you can always do the pattern matching yourself with the full list of
> > object. I juste don't think that full text search is compatible with
> > AppEngine datastore, but I can be wrong ! If someone can help us, I'll be
> > happy to hear some real explanation :)
> > Christian

> > On Wed, Mar 24, 2010 at 4:25 PM, Amit Tripathy <ar.tripa...@gmail.com>
> > wrote:
>
> >> Hi All,
>
> >> Is it possible to execute pattern matchingqueryusing objectify.


>
> >> For instance lets say I have an entitylike
>
> >> Class Dish
>
> >> @Id
> >> String name
> >> String description
>

> >> and I want to list out all Dish where namelike'pas%'

Jeff Schnitzer

unread,
Apr 1, 2010, 6:20:46 PM4/1/10
to objectify...@googlegroups.com
Sadly, true fulltext search is not something offered by appengine.
It's not impossible to layer on top of appengine, but it's a
nontrivial problem.

The dumb solution (that will work for small data sets) is to simply
iterate through *all* the data and do String.indexOf on the relevant
fields. This is basically what LIKE '%basic%' does in most databases.

The smart solution is to build a fulltext index using something like
Lucene. Basically, Lucene goes through your strings, chops them into
words, "stems" the words (cuts off the 's' at the end of plurals,
etc), and builds an index from that word to all the places the word is
used. This isn't quite the same as doing a LIKE '%basic%' because the
fulltext index will only catch the word basic, it won't catch words
like 'msbasic'. The stemmer will probably catch words like
'basically' though.

Search around, maybe someone has built a fulltext indexer that runs on GAE.

Jeff

Pelle Poluha

unread,
Apr 2, 2010, 4:09:55 AM4/2/10
to objectify-appengine
Ok, my first option was to let the user search the "description"
field. But if a full text search is hard to accomplish, or costs a lot
of cpu time, then I'd rather let the user create a list of keywords
attached to the description. These could then be searched with the
equivalent of a "like 'basic%'". So the entity would look something
like this:

class @Unindexed Reason {
Long @Id id;
String description;
List<String> @Indexed keyword;
}

But how would the filter expression look like that searches all of the
keywords? Does the filter("keyword >=", "basic") work with a list?

/Pelle

On 2 Apr, 00:20, Jeff Schnitzer <j...@infohazard.org> wrote:
> Sadly, true fulltext search is not something offered by appengine.
> It's not impossible to layer on top of appengine, but it's a
> nontrivial problem.
>
> The dumb solution (that will work for small data sets) is to simply
> iterate through *all* the data and do String.indexOf on the relevant
> fields.  This is basically what LIKE '%basic%' does in most databases.
>
> The smart solution is to build a fulltext index using something like
> Lucene.  Basically, Lucene goes through your strings, chops them into
> words, "stems" the words (cuts off the 's' at the end of plurals,
> etc), and builds an index from that word to all the places the word is
> used.  This isn't quite the same as doing a LIKE '%basic%' because the
> fulltext index will only catch the word basic, it won't catch words
> like 'msbasic'.  The stemmer will probably catch words like
> 'basically' though.
>
> Search around, maybe someone has built a fulltext indexer that runs on GAE.
>

Jeff Schnitzer

unread,
Apr 2, 2010, 1:20:20 PM4/2/10
to objectify...@googlegroups.com
Filter does work with list properties, but you should test this to
verify that it works:

Iterable<Reason> reasons = ofy.query(Reason.class).filter("keyword
>=", "basic").filter("keyword <=", "basic\uFFFD");

I *think* the datastore will interpret this as look for keywords that
start with 'basic', however it might interpret this as look for
Reasons that have any keyword that is greater than 'basic' and any
keyword less than 'basic\uFFFD'. Obviously that's not what you want.

If it turns out that the datastore works the second way, you'll want
to make a separate Keyword entity:

class Keyword {
@Id Long id;
@Parent Key<Reason> reason;
String value;
}

Then you can do this:

Iterable<Reason> reasons = ofy.query(Keyword.class).filter("keyword
>=", "basic").filter("keyword <=", "basic\uFFFD").fetchParents();

The one caveat here is that fetchParents() will fetch *all* the values
in one go, so you can't use a cursor and the task queue to iterate
through very large datasets.

Jeff

Reply all
Reply to author
Forward
0 new messages