http://blog.piccolboni.info/2011/09/connected-components-example-rewritten.html
Please do not hesitate with comments or questions.
So no one answered my question from the email entitled "Calculating In-Degree using R MapReduce over Hadoop."
"Can you tell me how to calculate a graph's degree distribution? --- HINT: its this MapReduce job composed with another."
As we saw before, this is how you calculate the in-degree for every vertex in the graph. Lets do it again, but this time, over a larger random graph -- 100 vertices and 10,000 edges.
> graph <- matrix(sample.int(100,10000,replace=TRUE),ncol=2)
> graph[1:5,]
[,1] [,2]
[1,] 97 4
[2,] 87 63
[3,] 9 79
[4,] 92 62
[5,] 22 77
> x <- to.dfs(graph)
> y <- mapreduce(x, map=function(k,v) keyval(v[2],1), reduce=function(k,v) keyval(k, length(v)))
The variable y is a reference to an HDFS file that has the in-degree of every vertex. We can then use this result to produce a degree distribution. First, what is a degree distribution? It is the histogram/density of the various degrees -- i.e. how many nodes have a degree of 1? of 2? of 3? of 4? etc.... This is the MapReduce job that does the trick.
> z <- mapreduce(y, map=function(k,v) keyval(v,1), reduce=function(k,v) keyval(k, length(v)))
MAP: Take all the vertex degrees and emit each degree with the the value 1.
REDUCE: Take all the degrees and their list of 1s and count their list -- that is the distribution for each degree.
Now lets plot this in R. First we load the distribution into memory (which is much smaller than the graph.
> dist <- from.dfs(z)
> length(dist)
[1] 27
> hist(matrix(c(as.numeric(keys(dist)),as.numeric(values(dist))),nrow=2),breaks=10,main="Random Graph Degree Distribution",xlab="in-degree")
Marko,
Love this. It demands a screencast to replicate!
Send from a device with crappy keyboard and autocorrection.
/peter
Here is the algorithm in composite starting from an edge list and yielding an in-degree distribution:
mapreduce(
mapreduce(to.dfs(graph),
map=function(k,v) keyval(v[2],1), reduce=function(k,v) keyval(k, length(v))) ,
map=function(k,v) keyval(v,1), reduce=function(k,v) keyval(k, length(v)))))
Boo yea!,
Marko.
http://markorodriguez.com
Very interesting, but isn't there a way to accomplish this with pure Gremlin?