[GSoC2014] Cassovary graph compression algorithms

90 views

Skip to first unread message

Nilesh Chakraborty

unread,

Mar 4, 2014, 1:11:03 AM3/4/14

to twitter-...@googlegroups.com

Hi Pankaj, Aneesh and everyone!

I'm a senior year B.Tech undergraduate majoring in Computer Science. Machine learning and data science excite me like nothing else.

I read the project details for graph compression algorithms for Cassovary and studied the whole layered labeled propagation (LLP) paper [4]. It's pretty interesting, and from the looks of it I think it's easy to implement it in a scalable way using MapReduce. In a previous topic [5] Pankaj mentioned that distributed processing would be good to have in Cassovary, so I think it'd be pretty awesome if I can use Spark to implement something like LLP (it's an iterative algorithm, Spark should work way faster than Hadoop).

I had worked on peta-scale graph centrality computation using MapReduce during a research internship a year ago - I built a Hadoop implementation for computing PageRank on huge graphs (it's ongoing, some WIP code at [1]), trying to make get lot more performance improvements than Pegasus [2]. I'm currently trying to extend those optimizations to similar sparse linear algebraic algorithms involving big matrices and vectors (eg. Lanczos, SVD etc.) and playing around with Spark to see if that gives me greater performance benefits.

Also, I'm currently working on a project where I'm trying to improve markov-chain-based rank aggregation (as proposed by Dwork et al. in [3]) by giving it a treatment of a new formulation of Kemeny Optimal, involves Shanon Entropy etc. - I can discuss in more details if you want to have a chat. :)

I started learning Scala while working with Spark; I'm pretty great with Python and Java, but new with Scala at the moment (it's quite easy to pick up though, so no problem there).

Please let me know if you have any questions and I'll be glad to clarify them for you. Could you give me some pointers as to what my next steps should be?

Cheers,

Nilesh

[1] : https://github.com/nilesh-c/graphfu

[2] : http://www.cs.cmu.edu/~ukang/papers/PegasusICDM2009.pdf

[3] : http://www10.org/cdrom/papers/pdf/p577.pdf

[4] : http://vigna.di.unimi.it/ftp/papers/LayeredLabelPropagation.pdf

[5] : https://groups.google.com/forum/#!topic/twitter-cassovary/kZ1dpNkO7M8

Pankaj Gupta

unread,

Mar 4, 2014, 4:03:42 PM3/4/14

to twitter-...@googlegroups.com

Hi Nilesh -- note that Cassovary is not a distributed system, so this particular project does not relate to MapReduce or Spark.

--
You received this message because you are subscribed to the Google Groups "Cassovary" group.
To unsubscribe from this group and stop receiving emails from it, send an email to twitter-cassov...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply all

Reply to author

Forward

0 new messages