Advice on modelling a modern newsfeed

Gordon Rankin

unread,

Sep 24, 2016, 5:01:48 AM9/24/16

to Aurelius

I'm new to Titan and have spent the week playing around with it and experimenting with friendship/social relationships.

I'm now convinced that Titan backed by Cassandra is the best fit for creating a modern application with basic social requirements. Namely, a newsfeed.

As such, I am looking for the most efficient model of implementing one. Having scoured the internet, it has become clear that most of the advice is at least 3-4 years old and not specifically applicable to Titan. Furthermore, I am aware that most of the advice out there was given prior to Titan introducing vertex-centric indexes.

Anyway, the two predominant models that I have come across are 'Linked List' and 'Graphity':

"Linked List"

Each user (vertex) has a time ordered list of their posts (vertexes), as described in the Neo4J documentation:

http://neo4j.com/docs/snapshot/cypher-cookbook-newsfeed.html

When querying, we would have to look at a users friends, then look at all of each friends posts, collect them all, order them by date, and finally limit them to the most recent 15. This seems wasteful to me as it requires traversing all off the posts of all of a users friends, just to retrieve the most recent 15 items.

"Graphity"

This is similar to the above but adds a huge number of extra edges to make reading far faster, ultimately creating a directly transversable path through the most recent posts:

http://www.rene-pickhardt.de/graphity-an-efficient-graph-model-for-retrieving-the-top-k-news-feeds-for-users-in-social-networks/

While it looks fantastic for read performance, it is poor for writes, and in my opinion, overly complex for what really should be a simple problem in a Graph database. I would like to avoid this method if possible, i'm not convinced it would work well with an eventually consistent datastore.

Now, as I said, this advice, and these methods are relatively old in terms of the speed of progress of modern graph databases. Furthermore, neither were implemented with Titan and vertex-centric indexes in mind.

Are either of these two approaches still worthwhile? Or since we now have vertex-centric indexes, should I simply use a single 'posted' edge between each post and it's 'user', making the model more natural. Or is there another model I am not aware of?

So what i'm asking is: What it the best way to model a newsfeed, for efficient retrieval, in Titan 1.0+. Especially baring in mind that we will need to add comments/replies, likes, entities (photos/videos/urls) to posts also.

Thanks in advance for any advice.

Gordon Rankin

unread,

Oct 11, 2016, 5:57:39 PM10/11/16

to Aurelius

Shamelss bump. I really could use some opinions on this.

Thanks.

HadoopMarc

unread,

Oct 12, 2016, 7:52:29 AM10/12/16

to Aurelius

Hi Gordon,

Your spec:

When querying, we would have to look at a users friends, then look at all of each friends posts, collect them all, order them by date, and finally limit them to the most recent 15. This seems wasteful to me as it requires traversing all off the posts of all of a users friends, just to retrieve the most recent 15 items.

I think your assumption on collecting them all is wrong for Titan. See 8.1.3 in:

http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html

So, the indexing backend maintains order and you can specify the limit of 15 in your query.

HTH, Marc

Op zaterdag 24 september 2016 11:01:48 UTC+2 schreef Gordon Rankin:

Gordon Rankin

unread,

Oct 14, 2016, 10:42:40 AM10/14/16

to Aurelius

Hi HadoopMarc,

Thanks for your reply.

I see, so would I be better off ignoring the model shown in the Neo4j documentation, and simply linking each post to the user via an edge directly? Would a vertex-centric index then be necessary to ensure we don't have to look through all posts of each user?

Thanks again.

HadoopMarc

unread,

Oct 14, 2016, 3:53:11 PM10/14/16

to Aurelius

Hi Gordon,

I have not used ES as indexing backend for Titan myself, just Titan with its internal index. When I wrote my previous answer I had actually the Neo4j graph model in mind as it comes closest to the structure of the data itself. In the line of the section 8.1.3 I mentioned, your query would look like:

g.V().has('username', 'centralfriend').out('friendof').out('posted').order().by('datetime', decr).limit(15)

where 'username' and 'datetime' need to be indexed and 'datetime' needs to be a MixedIndex.

Indeed, this could be a waste, depending on how well Titan or any other datastore optimizes this query. But at least you could just try it and get some experience with the systems involved. If it does not perform, it is a small step towards a graph structure that is more aligned with your first usecase.

Also remember, given just this usecase, there is no hard requirement to do this with a graph database. It depends more on the other use cases you expect to encounter with this data.

Cheers,   Marc

Op vrijdag 14 oktober 2016 16:42:40 UTC+2 schreef Gordon Rankin:

Gordon Rankin

unread,

Oct 18, 2016, 7:13:41 AM10/18/16

to Aurelius

Thank you. Your advice has been invaluable.

Max De Marzi Jr.

unread,

Oct 28, 2016, 5:33:54 PM10/28/16

to Aurelius

Hey Gordon,

I ran into your post today and it inspired a quick blog post. See https://maxdemarzi.com/2016/10/28/news-feeds/ for different ways to model news feeds.