Persisting Lists

62 views
Skip to first unread message

Bruno Sandivilli

unread,
Jul 26, 2011, 4:19:03 PM7/26/11
to Google App Engine
Hi, i'm modeling a social network, i use to create a table with the
relations like (userid,friendid)
but in nosql, im planning to simple add a list of id to each user,
like User { List<Integer> firends }.

Is this wrong(it works, but and the performance) ?

Pascal Voitot Dev

unread,
Jul 27, 2011, 11:37:49 AM7/27/11
to google-a...@googlegroups.com
Hi,

You can look at this post on stackoverflow to have a few more info! The most known issues are:
- the limit of 5000 elements per list
 - the famous index explosion issue (last GAE version tells that index explosion won't happen but I don't know exactly what it means)

Pascal


--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.


Bruno Sandivilli

unread,
Jul 27, 2011, 12:09:27 PM7/27/11
to google-a...@googlegroups.com
Ok, thanks! This is not a proble since facebook have this limit too. But for Followers a have to implement some workarround for this(since followers is a more big number); Any ideas? Thanks

2011/7/27 Pascal Voitot Dev <pascal.v...@gmail.com>

Pascal Voitot Dev

unread,
Jul 27, 2011, 12:23:33 PM7/27/11
to google-a...@googlegroups.com
You could copy the followers (a very light entity) as child entities of your parent! In this way, followers and parent user will be in the same entity group and can be retrieved in the same transaction! But you can't retrieve more than 1000 entities in a single request so you need to manage offsets or cursors!

You could also serialize the followers id in a json string for ex but it shouldn't exceed 1Mb! This is quite raw as you can't perform queries on this field but can be useful sometimes!

The last solution I see just now is to reverse the problem and create a table joining the users and their followers and perform a request on the followers following a given userid

pascal

Ernesto Oltra

unread,
Jul 27, 2011, 1:12:59 PM7/27/11
to google-a...@googlegroups.com
A little correction, you could fetch more than 1000 results (the limit has been disposed time ago) but it's not recommendable have more than 200/300 results (more or less, for perfomance). I strongly recommend you seeing this video about ListProperty, fan-outs, etc (Google I/O 2009):

Ernesto Oltra

unread,
Jul 27, 2011, 1:17:36 PM7/27/11
to google-a...@googlegroups.com
And for followers, you could too shard the lists. You can have several entities, each with, about 100 results or so (or 1000, or 2000, I prefer 100 for easy of serializing/deserializing). All these would have the user as ancestor. When listing, take only one entity, deserializing its lists (only 100 results) and show some of them. When listing all, you can use cursors and some tricks to have the job done (job = paging =) )

Bruno Sandivilli

unread,
Jul 27, 2011, 1:31:47 PM7/27/11
to google-a...@googlegroups.com
Ok, Thanks! I catch it. This solved my problem
I was wondering , this is ok for listing users and etc, but the hard thig is, when the user post a message i will have to send this message to 200.000 users. How would i select this users to append this feed into their profiles?
Thanks again.

2011/7/27 Ernesto Oltra <ernest...@gmail.com>
And for followers, you could too shard the lists. You can have several entities, each with, about 100 results or so (or 1000, or 2000, I prefer 100 for easy of serializing/deserializing). All these would have the user as ancestor. When listing, take only one entity, deserializing its lists (only 100 results) and show some of them. When listing all, you can use cursors and some tricks to have the job done (job = paging =) )

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/tRQCOATsWdUJ.

Ernesto Oltra

unread,
Jul 27, 2011, 8:57:21 PM7/27/11
to google-a...@googlegroups.com
Wow... so many users.. Perhaps asking the Google+ guys.. =)

The only idea I have right now, is use taskqueues (or in the same request, it depends on latency) to create a «notification», referencing the post, and the users. Then, list the most recents notifications for the user. Delete the old ones through a cron job. Anyway, I have to say that having the notifications for 200.000 users in my news feed would not be a very pleaseant experience, they will change every nanosecond!

Ernesto Oltra

unread,
Jul 27, 2011, 9:15:51 PM7/27/11
to google-a...@googlegroups.com
I thought one thing and I said something completely differente -.-' I meant having a model, with the same key_id/key_name as the post and a list of users (say 4000/4500). When that super-start with 200.000 followers post something, run a taskqueue and save several models with the info. Then, for listing, do a key-only query where the user is in the list of affected ones. Then, with the keys, obtaing the posts (they have the same key_id/key_name). A cron job will delete old notifications.

Most of users will have less than 700/800 followers (if it's something like Twitter), so they will consume only one notification model per post. And the costs come from indexes (a lot of lists to index), serialization (4000/4500 items), deserialization (we use it with keys, so almost no cost), deleting old notifications (we use keys too)

Anyway, I'm willing to hear new ideas, surely they can largely improve my system

Ernesto

Bruno Sandivilli

unread,
Jul 28, 2011, 12:07:52 PM7/28/11
to google-a...@googlegroups.com
Thanks! I wall implement this and test to see the results, in performance. I'll try to post the results here for the information of all.
Thanks again, i'm thinking that Objectify is not a good idea for this, so i will have to rewrite dome things.

2011/7/27 Ernesto Oltra <ernest...@gmail.com>

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/fC_ZvMvF6MoJ.

Pascal Voitot Dev

unread,
Jul 28, 2011, 12:17:53 PM7/28/11
to google-a...@googlegroups.com
good idea also :)

On Wed, Jul 27, 2011 at 7:17 PM, Ernesto Oltra <ernest...@gmail.com> wrote:
And for followers, you could too shard the lists. You can have several entities, each with, about 100 results or so (or 1000, or 2000, I prefer 100 for easy of serializing/deserializing). All these would have the user as ancestor. When listing, take only one entity, deserializing its lists (only 100 results) and show some of them. When listing all, you can use cursors and some tricks to have the job done (job = paging =) )

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/tRQCOATsWdUJ.

MiuMeet Support

unread,
Aug 2, 2011, 12:30:50 PM8/2/11
to google-a...@googlegroups.com
By the way, have a look at: http://devblog.miumeet.com/2011/08/much-more-efficient-implementation-of.html

It's much more efficient than db.ListProperty(int)

Cheers,
-Andrin

Bruno Sandivilli

unread,
Aug 3, 2011, 12:49:31 PM8/3/11
to google-a...@googlegroups.com
So, if an user X adds and User Y, the user Y will become an ancestor of X ? And if the user Y adds the user W too, so user X will have two ancestors(???) ? Any code snippets will be greatful.
Thanks again, for the help guys.

 
2011/8/2 MiuMeet Support <ro...@miumeet.com>

Ernesto Oltra

unread,
Aug 3, 2011, 6:52:20 PM8/3/11
to google-a...@googlegroups.com
Sorry, but I don't use Java. I will give you the Python version, it's pretty straight forward. As I said before, any improvements in my design or code will be welcome =)

class User(Model):
  # for easy of use, i will consider user_id as the key name of the entity  
  name, email, etc
  num_of_followers = IntegerProperty(default=0)

class FollowerInfo(Model):
  master = StringProperty() I've used String property to store the key_name, but you can use KeyProperty if you prefer. This will be the super-star posting new things
  followers = ListProperty() # or arrays, the other thing MiuMeet proposed
  full = BooleanProperty() # whether we can store more followers here or not

class Post(Model):
  content, author, etc...

class PostNotify(Model):
  # key name/key_id of the entity will be key_name/key_id of the post
  users = ListProperty()


User X follows User Y:

# Check if there is already a FollowerInfo model, and hasn't got more than 100 followers or so.
follower = Query(..).filter('full = ', False).filter('master', UserHere).fetch(1)
if not follower: # create one if we need it
  follower = FollowerInfo(parent=UserHere, followers=[,])

# Add follower, check if the info is full too
follower.followers.append(FollowerUserHere)
if len(follower.followers) >= 100:
  follower.full = True

follower.put() # save

# maybe this must be in a transaction?
user.num_of_followers += 1
user.put()


User X doesn't follow User Y anymore:

follower = Query(...).filter('followers = ', FollowerUserHere).fetch(1)
follower.followers.delete(FollowerUserHere)

# check if the model was full, now it will have one empty slot more
if follower.full and len(follower.followers) < 100:
  follower.full = False

follower.put() # save
  
# again, maybe this must be in a transaction?
user.num_of_followers -= 1
user.put()


List followers:

# I fetch one of the following info models. It may be null (no followers), one model(1-100 followers). If the user has more following info models (100-infinite), it will return only one of them.
follower = Query(..).filter('master =', UserHere).fetch(1)
if follower:
  # List here the follower.followers


Post:

if UserHere.num_of_followers > 100:
  taskqueue....() # add new task to process it offline, too much followers!
# process it inline here otherwise (same code as in the task)

TASK:

 - Get each one of the FollowerInfo models for the user
 - Create a PostNotify with the key_name/key_id of the post for each 4500 or so, followers.
 - Save all the PostNotify entities


List notifications:

# I run a keys-only query, finding the first 50 (or whatever you want notifications for this user).
# As you don't have to deserialize the list (keys-only), you'll never incur in that perfomance cost. The key has all the info you need (the post key_name/key_id, it's exactly the same)
notifications = Query(entity=PostNotify, keys_only=True).filter('users =', CurrentUserHere).fetch(50)

ids = []
for notification in notifications
  ids.append(notification.key_name)

# Now you have all the posts
posts = db.get(ids)







2011/8/3 Bruno Sandivilli <bruno.sa...@gmail.com>

Robert Kluin

unread,
Aug 3, 2011, 11:21:09 PM8/3/11
to google-a...@googlegroups.com
Reference properties are a completely separate idea from entity groups
(ie ancestor / parent - child relationships).

Reference properties simply store a key; that key can be used to fetch
the other entity. They can be changed to point at a different entity
as often as you like.

An entity's entity group (aka ancestor) is determined by its key --
this can not be changed. One advantage of entity groups is that they
allow you to operate on entities within a given group transactionally.

Robert

Reply all
Reply to author
Forward
0 new messages