Indexing Large Chunks of Text for Search

173 views
Skip to first unread message

Andrew Parker

unread,
Apr 8, 2008, 3:41:11 PM4/8/08
to Google App Engine
Lets say I wanted to build a blog CMS on AppEngine to compete with
WordPress. How would I implement search across large chunks of text?

Large text is stored as a db.text, and according the the documentation
for db.text, these objects are not indexed? Would I then have to hack
together my own index on top of my db.text object? Feels like
reinventing the wheel... not very DRY.

Thoughts?

Andrew

ma...@google.com

unread,
Apr 8, 2008, 6:32:42 PM4/8/08
to Google App Engine
Hi Andrew,
Currently we have a SearchableModel (google.appengine.ext.search)
subclass of db.Model that you can use to implement some basic search
functionality in your datastore. The docstring gives a good overview
of what is offered with this:

"""Full text indexing and search, implemented in pure python.

Defines a SearchableModel subclass of db.Model that supports full text
indexing and search, based on the datastore's existing indexes.

Don't expect too much. First, there's no ranking, which is a killer
drawback.
There's also no exact phrase match, substring match, boolean
operators,
stemming, or other common full text search features. Finally, support
for stop
words (common words that are not indexed) is currently limited to
English.

To be indexed, entities must be created and saved as SearchableModel
instances, e.g.:

class Article(search.SearchableModel):
text = db.TextProperty()
...

article = Article(text=...)
article.save()

To search the full text index, use the SearchableModel.all() method to
get an
instance of SearchableModel.Query, which subclasses db.Query. Use its
search()
method to provide a search query, in addition to any other filters or
sort
orders, e.g.:

query = article.all().search('a search
query').filter(...).order(...)
for result in query:
...

The full text index is stored in a property named
__searchable_text_index. If
you want to use search() in a query with an ancestor, filters, or sort
orders,
you'll need to create an index in index.yaml with the
__searchable_text_index
property. For example:

- kind: Article
properties:
- name: __searchable_text_index
- name: date
direction: desc
...

Note that using SearchableModel will noticeable increase the latency
of save()
operations, since it writes an index row for each indexable word. This
also
means that the latency of save() will increase roughly with the size
of the
properties in a given entity. Caveat hacker!"""

-Marzia

ma...@google.com

unread,
Apr 8, 2008, 6:33:04 PM4/8/08
to Google App Engine

xgdlm

unread,
Apr 8, 2008, 11:18:10 PM4/8/08
to Google App Engine
Hello

> Don't expect too much. First, there's no ranking, which is a killer
> drawback.
> There's also no exact phrase match, substring match, boolean
> operators,
> stemming, or other commonfulltextsearch features. Finally, support
> for stop
> words (common words that are not indexed) is currently limited to
> English.
>

Fulltext searching is one of the most important functionnalities of
ours apps (the one we'd love to move to google app engine, as it(s
already written in python). At the moment we use the excellent
sphinxsearch for fulltext. We would expect from google, a powerfull
full text engine with geoloc search, stemmer, aspell support and more
(yes I know this is day 0 :p) ... hey we are on google :) at the
moment, looking from my point of view this is a real drawback ...

xav
Reply all
Reply to author
Forward
0 new messages