Query for containing some text

1,418 views
Skip to first unread message

Deepak Singh

unread,
Apr 27, 2012, 6:02:16 PM4/27/12
to objectify...@googlegroups.com

Hi,

I have  DTO as follows,

public class TextDTO {
       private String title;
              private String city;
               private String state;
 
// getter setter
}

Its an objectify entity.

Now i need to make a query where the filter would be containing a particular text in all of the 3 fields.

objectify.query(TextDTO.class).filter("title contains", abc) or filter("city contains", abc) or filter("state contains", abc);

How can i make such type of query?
If it is not facilitated, any alternative ?




Thanks 
Deepak Singh

Jeff Schnitzer

unread,
Apr 27, 2012, 6:09:57 PM4/27/12
to objectify...@googlegroups.com
GAE has a full-text search feature in trusted tester phase.
Alternatively (and this is what I do), create an indexed synthetic
list property which holds a union of all the words you want to be able
to match on. If you want to get fancy, you can Lucene to break up
words and phrases into pieces. Here's some code that others may find
helpful:

/** Uses lucene tokenizer to create a set of lowercase token words */
private static Set<String> tokenize(String input) {

try {
TokenStream tok = new StandardTokenizer(Version.LUCENE_35, new
StringReader(input));
tok = new LowerCaseFilter(Version.LUCENE_35, tok);

Set<String> result = Sets.newHashSet();

CharTermAttribute termAttr = tok.addAttribute(CharTermAttribute.class);
while (tok.incrementToken()) {
String term = termAttr.toString();
result.add(term);
}

return result;
} catch (IOException ex) {
throw new BetterIOException(ex);
}
}

As an aside, there's something wrong with your system if you have an
entity with 'DTO' in the classname. DTO = data transfer object,
called that because it isn't an entity.

Jeff

Deepak Singh

unread,
Apr 27, 2012, 6:15:38 PM4/27/12
to objectify...@googlegroups.com
Its not a DTO infact, it is an objectify entity.
Ok. 
Can i use a simple contains query like 
objectify.query("Text.class").filter("title contains", abc).list()  ? Just for a single column.

Thanks
Deepak
--
Deepak Singh

Jeff Schnitzer

unread,
Apr 27, 2012, 6:25:50 PM4/27/12
to objectify...@googlegroups.com
There is no operator "contains" in the query system.

Please read this:
https://developers.google.com/appengine/docs/java/datastore/queries

Jeff

Petr Voldán

unread,
Dec 16, 2013, 12:24:12 PM12/16/13
to objectify...@googlegroups.com, je...@infohazard.org
Hi,
I try to understand how to query and save string tokens into datastore based on your example.  If I understand correctly, you save for every String corresponding set of tokens. I guess that you need to index this set of tokens. So can it explode the index? I think that for every word token it is needed to write into index. Am I correct?
 
Anyway I use for string query:
filter(“normalizedString >=", requestString)
.filter(“normalizedString <", requestString + "\uFFFD")

I guess that query through set of tokens is more efficient, am I right?  Or which solution is better?
Thank you

Nicholas Okunew

unread,
Dec 16, 2013, 5:34:12 PM12/16/13
to objectify...@googlegroups.com
You may want to consider using the SearchService for this sort of stuff, it works really well and handles querying across the set much better than datastore query hacks. We generally just store the document id and entity id as the same value, so once you do a search on the text index, you can look up the entity to get a consistent representation.



--
You received this message because you are subscribed to the Google Groups "objectify-appengine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to objectify-appen...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Petr Voldán

unread,
Dec 19, 2013, 12:47:38 AM12/19/13
to objectify...@googlegroups.com
thank you for advice. I’ve also studied Search API but I worried about the price. I don’t have practical experiences but Search API seems more expensive than datastore.  For this reason I’d like to know how index is used in case of Jeff tokenization solution. I mean if every stored token/word from string populate writing through index – probably yes. Anyway  I agree that Search API should  the correct way for this situation. 
Thank you

Nicholas Okunew

unread,
Dec 19, 2013, 2:29:11 AM12/19/13
to objectify...@googlegroups.com
The costs may be a little hard to compare without benchmarking your actual usage.

Search API is charged at GB indexed, GB Stored and searches performed.

Datastore is charged on reads and writes



The comparable difference here is probably this:
Datastore read/small ops will be many for each query (because objectify will pull back many entities)
Search API is per query (i.e. result count doesnt impact costing, I believe)

So you could calculate roughly how many reads/smalls per query you'd need to run to match one search query.

Then just account for data storage. If you need to link search results up with datastore reads then its all additional cost pretty much.
In that case you just need to trade off the cost of the search service vs the improved search results and querying flexibility. In my experience you'll pretty much always hit problems with indexes when you want to do any kind of search/filter functionality so you may have to bite the bullet anyway.

Best bet - benchmark on a free instance and compare the usage and calculate the relative cost.




On 19 December 2013 16:47, Petr Voldán <petr....@asamm.com> wrote:
thank you for advice. I’ve also studied Search API but I worried about the price. I don’t have practical experiences but Search API seems more expensive than datastore.  For this reason I’d like to know how index is used in case of Jeff tokenization solution. I mean if every stored token/word from string populate writing through index – probably yes. Anyway  I agree that Search API should  the correct way for this situation. 
Thank you

--

Jeff Schnitzer

unread,
Dec 23, 2013, 5:55:37 PM12/23/13
to objectify...@googlegroups.com
My solution (tokenize and index in the datastore) does not result in
an "exploding" index as it is usually defined; the cost is linear with
the number of tokens. Whether or not this will be better or worse than
the SearchService depends on the size and shape of your data. If you
have large data volumes, benchmarking is a good idea - but do some
napkin work to make sure you aren't going to spend a ton of time to
save a few pennies.

I like indexing in the datastore because it's simple, free from
synchronization issues, has a known performance profile, and cheap in
the data quantities that I apply this solution to. YMMV.

Jeff

Jeremy Leeder

unread,
Apr 2, 2014, 5:44:14 PM4/2/14
to objectify...@googlegroups.com, je...@infohazard.org
I followed your example here Jeff since my search only involves 3-4 words per Object.  But do you have an example on scoring the results.

So if someone searches "m260 camera" right now I get all cameras plus the m260 cameras.  But ideally I want the result sorted by score.  Right now I'm trying to avoid the Google Search API to save on costs.

Jeff Schnitzer

unread,
Apr 2, 2014, 9:51:41 PM4/2/14
to Jeremy Leeder, objectify...@googlegroups.com
My approach is not so clever. For me, "m260 camera" is an AND
operation not an OR operation, and it filters all results out that
don't have both those terms.

If you want scoring, you will almost certainly be better off with the
lucene-based system.

Jeff

Jon Stevens

unread,
Apr 2, 2014, 10:42:12 PM4/2/14
to objectify...@googlegroups.com, Jeremy Leeder
I don't know what your search requirements are, but it may be less expensive to just run something like elasticsearch or solar on a GCE instance. That said, the cost for appengine search is pretty low for a moderate amount of use.

jon



--
You received this message because you are subscribed to the Google Groups "objectify-appengine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to objectify-appen...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages