Seemingly undoable to build a simple app

Koen Bok

unread,

Dec 7, 2008, 4:37:14 PM12/7/08

to Google App Engine

First off, I _really_ like the idea of AppEngine. We're hosting most
of our sites (versionsapp.com, checkoutapp.com) on it and it works
great for that purpose. Impressed as we were, I started trying to
build our first serious webapp on AppEngine; a small webstore. What I
basically wanted to do is building a page where you could browse 50
products at a time, optionally limited by a search or tag, based on
Django. All went well until I wanted to implement search...

I grabbed a few example datasets from users of our Mac POS app, they
mostly have around a 1000 products with a name, brand and description.
I knew substring search was hard to implement, so I skipped that and
used the undocumented ext.search models with my own set of search
words, mostly around 40 per product. Because some searches took quite
long I decided to slice the results by 50 at a time, not impressive
but good enough. And because product descriptions don't change often I
could use memcache to cache each result 'slice'.

To do so, I figured I needed to at least order by name, so the slicing
would make sense. Therefore I had to create an index on name,
searchIndex (set of searchable words). Worked great, until I tried to
type two words. I told me I had to create another index: name,
searchIndex, searchIndex. Seemed logical, but then people may want to
type more then 2 words at the same time so I added 5 indexes to get up
to five words.

From that moment inserting products started to take more then a
second, which I guess was due to the indexing, but even worse, I
started to get errors like BadRequestError: Too many indexed
properties for entity: app. I found out the max indexes for an entity
is capped at 5000, and calculated that if I were to use max 3 words
for a search I could only add 17 searchable words in an items' index
(17*17*17=4913). I'm still not sure that math is 100% correct, but at
least I was able to insert most of my products again, but searching is
just horribly slow.

So now I'm stuck with a _lot_ of limitations for a simple app: only 3
search terms, no substring searching, 50 results at a time and in my
opinion pretty bad performance, especially when inserting. Maybe I'm
doing something horribly wrong, and I don't consider myself the best
db programmer out there but this just seems silly.

Again, this is not to bash AppEngine because I really see the
potential and like the idea. But as it stands now it's just unusable
for us (a small indie dev company that wants their products to scale),
and it's hard for me to imagine that anyone else can live with these
limitations. I guess what I'm saying is: I love this thing to work for
us! We want to use it and pay for it! Seriously, please fix the
searching... you're Google for god sake :-)

Kindest regards,

Koen Bok - madebysofa.com

lock

unread,

Dec 7, 2008, 7:32:02 PM12/7/08

to Google App Engine

I'm pretty new to app engine so there may be some
gotcha's in what I say, but I think my logic is ok.

It seems you have a pretty simple table with a few
fields (name, brand, description) and you want to be
able to search any one of the words in any of the
fields.

I'd suggest building a search term table. Keep your
current process of adding data to your table, your search
table will reference the data it contains.

The search table will be something like
class searchTerms(db.Model):
value = db.StringProperty()
key = db.KeyProperty() # If there's no such thing use the id

To build the search table tokenize the name, brand and
description so that you have a list of separate words.
Add each of these words to the search terms table with the
key that references the item in your data table.

When you want to search via keyword(s) you just query the
value field of the searchTerms table. You will however need
to do a query for each search term.

Hope that helps, cheers

Koen Bok

unread,

Dec 7, 2008, 7:43:07 PM12/7/08

to Google App Engine

Hey Lock,

I thought about your idea before, but it also has huge downsides:

- Inserting a single entity with 40 token words (not uncommon) will
lead to 41 inserts, and inserting is BigTables biggest weakness. Same
goes for deleting.
- It's (I think) not possible with this setup to get the second slice
of results (50-100) from a query with three words, sorted by name. I
would have to do 3 queries, which will return more results than I'm
looking for, see if they are in the other two results (intersect), and
sort them.

Please correct me if I'm wrong...

Kindest regards,

Koen Bok - madebysofa.com

lock

unread,

Dec 7, 2008, 8:10:40 PM12/7/08

to Google App Engine

Hmmm, your right.

You could try culling some of the token words, the simple
ones like 'and', 'a', 'or' to limit the number of puts,
but that will only go so far.

I don't think sorting and checking for intersection will
cost you too much. Its the number of results returned from
the queries that chew up the CPU cycles.
Slicing/pagination doesn't really seem worthwhile if you've
got all the results (which would occur if you used this
approach).

I'm out of ideas, would be interested to know if you do find
a solution though. It sounds like something a lot of apps
would need to do.

Good luck ;-)

yejun

unread,

Dec 7, 2008, 8:57:26 PM12/7/08

to Google App Engine

Why not just do a single word query, then do a brutal force scan on
result?

Bill

unread,

Dec 8, 2008, 5:38:40 AM12/8/08

to Google App Engine

There's a definite need for a full-text search API (particularly
substring matches) that just works for the datastore.

For public-facing information like your products, you might consider
outsourcing the search to something like Google Custom Search:
http://www.google.com/coop/cse/
(There's an add-free option if you want to cough up some money.)

Nick Johnson recently added Google CSE support to a branch of the
Bloog project (http://github.com/Arachnid/bloog/commits/breaking), so
you can see how he did it. The CSE approach can be tested over on his
blog: http://blog.notdot.net/

-Bill

Nick Johnson

unread,

Dec 8, 2008, 8:14:59 AM12/8/08

to Google App Engine

Yes, you're encountering 'exploding indexes'. The number of index
entries needed for a 3-term index is, for example, n*(n-1)*(n-2).

My suggestion would be this:
- Keep indexes for one and two uses of the term list.
- Keep a separate entity type that tracks how many occurrences there
are of each term in the database.
- To search, look up the terms, find the two least common ones, and
query on those. Then filter the remaining results for any other terms.

dalenewman

unread,

Dec 8, 2008, 9:16:53 AM12/8/08

to Google App Engine

I haven't tried this "workaround" yet, but until we have a Search API
available, it might work...

Try uploading your products to Google Base and then using their API to
search your own stuff. We upload all our books to google base, and if
I go to http://base.google.com/ and search for our domain and what I'm
looking for (e.g. betterworld.com google hacks ), it comes up with
lightning Google speed.

Note: I haven't checked the terms and conditions to see if this is
legal or not...

Dale

Reply all

Reply to author

Forward