To be distributed or not to be distributed, that is the question

Marc-Philippe Huget

unread,

Apr 17, 2013, 3:52:58 AM4/17/13

to twitter-...@googlegroups.com

Hello Pankaj and Cassovary users,

Reading a post on this group about FlockDB and Cassovary, I am wondering what the objective for Cassovary is regarding distribution.

Do you keep Cassovary for graphs on single node or in the future Cassovary will change of league and play in the same category than FlockDB with graph distribution and with the many issues associated to distribution? Thanks in advance for your answer

Cheers,
mph

Pankaj Gupta

unread,

Apr 24, 2013, 4:46:21 PM4/24/13

to twitter-...@googlegroups.com

Hi Marc,

By distribution, do you mean whether Cassovary works when the graph is partitioned across several machines somehow? Right now, it doesn't, but it is a potential future project.

Pankaj

--
You received this message because you are subscribed to the Google Groups "Cassovary" group.
To unsubscribe from this group and stop receiving emails from it, send an email to twitter-cassov...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Marc-Philippe Huget

unread,

Apr 25, 2013, 12:46:53 PM4/25/13

to twitter-...@googlegroups.com

Hello Pankaj,

Yes, this is it, splitting the huge graph across several machines. In that case, is there any conflict with FlockDB if Cassovary is going distributed?

Well, distributing Cassovary across several machines while keeping it simple in use, in development and in installation is something I would like to consider.
Pankaj, what could be the process? I submit this feature on the Cassovary future development list? I fork the project and start creating a distributed Cassovary and we will see how it goes. I guess as PMC you have your word to say on architecture and development, and what is inside Cassovary.

Let me know how we could proceed, I am eager developing a distributed version of Cassovary

Cheers,
Marc-Philippe aka mph

Marc-Philippe Huget

unread,

Apr 25, 2013, 12:47:25 PM4/25/13

to twitter-...@googlegroups.com

Hello Pankaj,

Yes, this is it, splitting the huge graph across several machines.

Well,

Ajeet Grewal

unread,

Apr 25, 2013, 1:20:38 PM4/25/13

to twitter-...@googlegroups.com

There is not much overlap with FlockDB, the use case for flock is to be a persistent store for a huge graph, and support simple operations on it.

The use case for cassowary is to run more sophisticated algorithms quickly. We dont care about the persistence of the graph, as it is not the primary store for the graph.

Please feel free to experiment !

--
Regards,
Ajeet

Marc-Philippe Huget

unread,

Apr 25, 2013, 2:48:45 PM4/25/13

to twitter-...@googlegroups.com

Hello Ajeet,

So do you think distributing Cassovary is of importance or can be delayed?
What kinds of sophisticated algorithms are you looking for Cassovary?

Cheers,
mph

Ajeet Grewal

unread,

Apr 25, 2013, 3:05:21 PM4/25/13

to twitter-...@googlegroups.com

On Thu, Apr 25, 2013 at 11:48 AM, Marc-Philippe Huget <mph...@gmail.com> wrote:

Hello Ajeet,

So do you think distributing Cassovary is of importance or can be delayed?

Distributing the graph while maintaining performance is non-trivial. Do you have an approach in mind? Doing this, would be a educational at the least.

What kinds of sophisticated algorithms are you looking for Cassovary?

Here is an example: https://github.com/twitter/cassovary/blob/master/src/main/scala/com/twitter/cassovary/algorithms/PageRank.scala

--
Regards,
Ajeet

Marc-Philippe Huget

unread,

Apr 25, 2013, 3:53:32 PM4/25/13

to twitter-...@googlegroups.com

Hello Ajeet,

Maintaining performance on algorithms when distributing is a harsh question-research question... Some elements of answers could be using a distributed hashtable approach, maybe with Zookeeper, another approach could be to consider CUDA for parallelising computation on graphs. Once again, that is a question that could be considered seriously, as soon as we are able to distribute the whole graph.

Cheers,
mph

Reply all

Reply to author

Forward