To be distributed or not to be distributed, that is the question

101 views
Skip to first unread message

Marc-Philippe Huget

unread,
Apr 17, 2013, 3:52:58 AM4/17/13
to twitter-...@googlegroups.com
Hello Pankaj and Cassovary users,

Reading a post on this group about FlockDB and Cassovary, I am wondering what the objective for Cassovary is regarding distribution.

Do you keep Cassovary for graphs on single node or in the future Cassovary will change of league and play in the same category than FlockDB with graph distribution and with the many issues associated to distribution? Thanks in advance for your answer

Cheers,
mph

Pankaj Gupta

unread,
Apr 24, 2013, 4:46:21 PM4/24/13
to twitter-...@googlegroups.com
Hi Marc,

By distribution, do you mean whether Cassovary works when the graph is partitioned across several machines somehow? Right now, it doesn't, but it is a potential future project.

Pankaj

--
You received this message because you are subscribed to the Google Groups "Cassovary" group.
To unsubscribe from this group and stop receiving emails from it, send an email to twitter-cassov...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Marc-Philippe Huget

unread,
Apr 25, 2013, 12:46:53 PM4/25/13
to twitter-...@googlegroups.com
Hello Pankaj,

Yes, this is it, splitting the huge graph across several machines. In that case, is there any conflict with FlockDB if Cassovary is going distributed?

Well, distributing Cassovary across several machines while keeping it simple in use, in development and in installation is something I would like to consider.
Pankaj, what could be the process? I submit this feature on the Cassovary future development list? I fork the project and start creating a distributed Cassovary and we will see how it goes. I guess as PMC you have your word to say on architecture and development, and what is inside Cassovary.

Let me know how we could proceed, I am eager developing a distributed version of Cassovary

Cheers,
Marc-Philippe aka mph

Marc-Philippe Huget

unread,
Apr 25, 2013, 12:47:25 PM4/25/13
to twitter-...@googlegroups.com
Hello Pankaj,

Yes, this is it, splitting the huge graph across several machines.

Well,

Ajeet Grewal

unread,
Apr 25, 2013, 1:20:38 PM4/25/13
to twitter-...@googlegroups.com
There is not much overlap with FlockDB, the use case for flock is to be a persistent store for a huge graph, and support simple operations on it.

The use case for cassowary is to run more sophisticated algorithms quickly. We dont care about the persistence of the graph, as it is not the primary store for the graph.

Please feel free to experiment !
--
Regards,
Ajeet

Marc-Philippe Huget

unread,
Apr 25, 2013, 2:48:45 PM4/25/13
to twitter-...@googlegroups.com
Hello Ajeet,

So do you think distributing Cassovary is of importance or can be delayed?
What kinds of sophisticated algorithms are you looking for Cassovary?

Cheers,
mph

Ajeet Grewal

unread,
Apr 25, 2013, 3:05:21 PM4/25/13
to twitter-...@googlegroups.com
On Thu, Apr 25, 2013 at 11:48 AM, Marc-Philippe Huget <mph...@gmail.com> wrote:
Hello Ajeet,

So do you think distributing Cassovary is of importance or can be delayed?

Distributing the graph while maintaining performance is non-trivial. Do you have an approach in mind? Doing this, would be a educational at the least.
 
What kinds of sophisticated algorithms are you looking for Cassovary?




--
Regards,
Ajeet

Marc-Philippe Huget

unread,
Apr 25, 2013, 3:53:32 PM4/25/13
to twitter-...@googlegroups.com
Hello Ajeet,

Maintaining performance on algorithms when distributing is a harsh question-research question... Some elements of answers could be using a distributed hashtable approach, maybe with Zookeeper, another approach could be to consider CUDA for parallelising computation on graphs. Once again, that is a question that could be considered seriously, as soon as we are able to distribute the whole graph.

Cheers,
mph
Reply all
Reply to author
Forward
0 new messages