[google-appengine] data structure to implement "some of your friends also like this" feature.

20 views
Skip to first unread message

Herbert

unread,
Apr 26, 2010, 10:13:15 AM4/26/10
to Google App Engine
hi,

i'm working on a book review site in which users can have friends.
exactly like the facebook's 'like' function, we'll like to tell a user
which of his/her friends also like the book he's looking at. but i
can't figure out how to do that with datastore, if possible at all.

we're using relational indexes for friends and "likes" which works
like a charm, very fast fetch

currently we have:

class User:
friends = db.ListProperty()
likes = db.ListProperty()

class BookIndex: # parent=Book
users = db.ListProperty()

BookIndex.users lets get all the favorite books of any users, works
well. User.likes tells me all the users that have marked a certain
book as favorite, works well too.

The problem is, when a user is looking at book A, I could easily get
all the people that likes A but how am I able to filter out the
friends? I'm satisfied if I have to do some precals and maybe just get
"some of the friends" who likes A.

I don't know, it seems like i'm looking at a problem of intersecting
two lists. A book can be liked by 3000 ppl, a user could have 100
friends, how am I going to find out which of the 100 friends are in
these 3000 ppl?

Any thoughts would be greatly appreciated. Thank you!

(and how did the facbeook guys implement the "like" button?)

Herbert

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Herbert

unread,
Apr 26, 2010, 10:20:33 AM4/26/10
to Google App Engine
Sorry, suddenly got a thought on it.

Can I do two equality check on both User.friends and User.likes ?
would that be cause an "exploding index?"

I.E. User.all().filter("friends =", me).filter("likes =", A).fetch(3)

thanks!

Krishna

unread,
Apr 27, 2010, 5:16:01 AM4/27/10
to Google App Engine
I'm interested in hearing thoughts on this as well.

Herbert: I had a similar problem sometime ago and I solved it by using
a single ListProperty with custom prefixes to distinguish between
fields of different types. So in your example, I would have used a
prefix "f.." to represent a friend and "l.." to represent a like. So
you would filter Users by friendlikes (one list property instead of
two separate ones) == "f..me" && "l..A".

Herbert

unread,
Apr 27, 2010, 6:45:27 AM4/27/10
to Google App Engine
Krishna: i see what you mean. so instead of querying for two lists you
do a prefixed self-join. i quickly test my version that queries two
lists that actually works fine, but i don't have a big test case yet.
coz anything bigger than thousand rows slows down in the dev server
even for simple query anyway, and i haven't had time to try that on
the production server yet. i reall don't need sorting, so i'm happy as
long as i could get "some of the results" safely and quickly

after reading your approach i was thinking, that it shoudlnt make a
difference querying the same list twice or two lists once. recalling
the zig-zag search in the "building scalable apps" talk, since both
versions mentioned above have two equality filters, it seems the query
will perform the same 'zig-zag' search, just on different values.

thanks much!

Nick Johnson (Google)

unread,
Apr 27, 2010, 6:59:40 AM4/27/10
to google-a...@googlegroups.com
Hi Herbert,

A query like this will use the merge join strategy by default. If it's too slow for the merge join strategy, you could have to build a custom index for it, which would indeed be 'exploding'  - each User entity would have len(friends)*len(likes) index entries.

-Nick
--
Nick Johnson, Developer Programs Engineer, App Engine Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: 368047
Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: 368047

Herbert

unread,
Apr 27, 2010, 11:42:22 PM4/27/10
to Google App Engine
Hi Nick

Thank you. One thing I'm still a bit confused about index. I had the
impression that if I've executed all kinds of queries before I deploy,
all required indexes would be included, am I correct?

Does building a custom index improve performance on complex queries
even though it is not required? Or I could just stick to the rules
from the doc, and make sure I cover all queries before I deploy?

Thanks!

On Apr 27, 6:59 pm, "Nick Johnson (Google)" <nick.john...@google.com>
wrote:
> Hi Herbert,
>
> A query like this will use the merge join strategy by default. If it's too
> slow for the merge join strategy, you could have to build a custom index for
> it, which would indeed be 'exploding'  - each User entity would have
> len(friends)*len(likes) index entries.
>
> -Nick
>
>
>
> On Mon, Apr 26, 2010 at 3:20 PM, Herbert <herber...@gmail.com> wrote:
> > Sorry, suddenly got a thought on it.
>
> > Can I do two equality check on both User.friends and User.likes ?
> > would that be cause an "exploding index?"
>
> > I.E. User.all().filter("friends =", me).filter("likes =", A).fetch(3)
>
> > thanks!
> > Herbert
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Google App Engine" group.
> > To post to this group, send email to google-a...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > google-appengi...@googlegroups.com<google-appengine%2Bunsu...@googlegroups.com>
> > .

Nick Johnson (Google)

unread,
Apr 28, 2010, 5:38:05 AM4/28/10
to google-a...@googlegroups.com
Hi Herbert,

Merge join queries - queries with multiple equality filters but no inequalities or sort orders - are a special case. They can be satisfied using the built in indexes and a merge join strategy, but they can also use a custom index. The production environment will use the index if it's present, and otherwise will do a merge join.

Generally, a merge join works well, but there are situations in which it doesn't - in which case you may need to add an explicit index.

-Nick Johnson

Herbert

unread,
Apr 28, 2010, 12:27:51 PM4/28/10
to Google App Engine
Hi NIck

Thank you so much! So if I have merge join queries and if I can spare
resources on a custom index, there may be room for improvement on read
performance? I guess I could find this out with a larger data set on
the production server and query before / after building a custom
index.

Herbert

On Apr 28, 5:38 pm, "Nick Johnson (Google)" <nick.john...@google.com>
wrote:
> Hi Herbert,
>
> > <google-appengine%2Bunsu...@googlegroups.com<google-appengine%252Buns...@googlegroups.com>
Reply all
Reply to author
Forward
0 new messages