Gmail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
Indexing Large Chunks of Text for Search
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  4 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Andrew Parker  
View profile  
 More options Apr 8 2008, 3:41 pm
From: Andrew Parker <andrew.par...@gmail.com>
Date: Tue, 8 Apr 2008 12:41:11 -0700 (PDT)
Local: Tues, Apr 8 2008 3:41 pm
Subject: Indexing Large Chunks of Text for Search
Lets say I wanted to build a blog CMS on AppEngine to compete with
WordPress.  How would I implement search across large chunks of text?

Large text is stored as a db.text, and according the the documentation
for db.text, these objects are not indexed?  Would I then have to hack
together my own index on top of my db.text object?  Feels like
reinventing the wheel... not very DRY.

Thoughts?

Andrew


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
ma...@google.com  
View profile  
(2 users)  More options Apr 8 2008, 6:32 pm
From: ma...@google.com
Date: Tue, 8 Apr 2008 15:32:42 -0700 (PDT)
Local: Tues, Apr 8 2008 6:32 pm
Subject: Re: Indexing Large Chunks of Text for Search
Hi Andrew,
 Currently we have a SearchableModel (google.appengine.ext.search)
subclass of db.Model that you can use to implement some basic search
functionality in your datastore.  The docstring gives a good overview
of what is offered with this:

"""Full text indexing and search, implemented in pure python.

Defines a SearchableModel subclass of db.Model that supports full text
indexing and search, based on the datastore's existing indexes.

Don't expect too much. First, there's no ranking, which is a killer
drawback.
There's also no exact phrase match, substring match, boolean
operators,
stemming, or other common full text search features. Finally, support
for stop
words (common words that are not indexed) is currently limited to
English.

To be indexed, entities must be created and saved as SearchableModel
instances, e.g.:

  class Article(search.SearchableModel):
    text = db.TextProperty()
    ...

  article = Article(text=...)
  article.save()

To search the full text index, use the SearchableModel.all() method to
get an
instance of SearchableModel.Query, which subclasses db.Query. Use its
search()
method to provide a search query, in addition to any other filters or
sort
orders, e.g.:

  query = article.all().search('a search
query').filter(...).order(...)
  for result in query:
    ...

The full text index is stored in a property named
__searchable_text_index. If
you want to use search() in a query with an ancestor, filters, or sort
orders,
you'll need to create an index in index.yaml with the
__searchable_text_index
property. For example:

  - kind: Article
    properties:
    - name: __searchable_text_index
    - name: date
      direction: desc
    ...

Note that using SearchableModel will noticeable increase the latency
of save()
operations, since it writes an index row for each indexable word. This
also
means that the latency of save() will increase roughly with the size
of the
properties in a given entity. Caveat hacker!"""

-Marzia

On Apr 8, 12:41 pm, Andrew Parker <andrew.par...@gmail.com> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
ma...@google.com  
View profile  
 More options Apr 8 2008, 6:33 pm
From: ma...@google.com
Date: Tue, 8 Apr 2008 15:33:04 -0700 (PDT)
Local: Tues, Apr 8 2008 6:33 pm
Subject: Re: Indexing Large Chunks of Text for Search
Hi Andrew,
 Currently we have a SearchableModel (google.appengine.ext.search)
subclass of db.Model that you can use to implement some basic search
functionality in your datastore.  The docstring gives a good overview
of what is offered with this:

"""Full text indexing and search, implemented in pure python.

Defines a SearchableModel subclass of db.Model that supports full text
indexing and search, based on the datastore's existing indexes.

Don't expect too much. First, there's no ranking, which is a killer
drawback.
There's also no exact phrase match, substring match, boolean
operators,
stemming, or other common full text search features. Finally, support
for stop
words (common words that are not indexed) is currently limited to
English.

To be indexed, entities must be created and saved as SearchableModel
instances, e.g.:

  class Article(search.SearchableModel):
    text = db.TextProperty()
    ...

  article = Article(text=...)
  article.save()

To search the full text index, use the SearchableModel.all() method to
get an
instance of SearchableModel.Query, which subclasses db.Query. Use its
search()
method to provide a search query, in addition to any other filters or
sort
orders, e.g.:

  query = article.all().search('a search
query').filter(...).order(...)
  for result in query:
    ...

The full text index is stored in a property named
__searchable_text_index. If
you want to use search() in a query with an ancestor, filters, or sort
orders,
you'll need to create an index in index.yaml with the
__searchable_text_index
property. For example:

  - kind: Article
    properties:
    - name: __searchable_text_index
    - name: date
      direction: desc
    ...

Note that using SearchableModel will noticeable increase the latency
of save()
operations, since it writes an index row for each indexable word. This
also
means that the latency of save() will increase roughly with the size
of the
properties in a given entity. Caveat hacker!"""

-Marzia

On Apr 8, 12:41 pm, Andrew Parker <andrew.par...@gmail.com> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
xgdlm  
View profile  
 More options Apr 8 2008, 11:18 pm
From: xgdlm <grang...@gmail.com>
Date: Tue, 8 Apr 2008 20:18:10 -0700 (PDT)
Local: Tues, Apr 8 2008 11:18 pm
Subject: Re: Indexing Large Chunks of Text for Search
Hello

> Don't expect too much. First, there's no ranking, which is a killer
> drawback.
> There's also no exact phrase match, substring match, boolean
> operators,
> stemming, or other commonfulltextsearch features. Finally, support
> for stop
> words (common words that are not indexed) is currently limited to
> English.

Fulltext searching is one of the most important functionnalities of
ours apps (the one we'd love to move to google app engine, as it(s
already written in python). At the moment we use the excellent
sphinxsearch for fulltext. We would expect from google, a powerfull
full text engine with geoloc search, stemmer, aspell support and more
(yes I know this is day 0 :p) ... hey we are on google :) at the
moment, looking from my point of view this is a real drawback ...

xav


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google