The variable y is a reference to an HDFS file that has the in-degree of every vertex. We can then use this result to produce a degree distribution. First, what is a degree distribution? It is the histogram/density of the various degrees -- i.e. how many nodes have a degree of 1? of 2? of 3? of 4? etc.... This is the MapReduce job that does the trick.
> z <- mapreduce(y, map=function(k,v) keyval(v,1), reduce=function(k,v) keyval(k, length(v)))
MAP: Take all the vertex degrees and emit each degree with the the value 1. REDUCE: Take all the degrees and their list of 1s and count their list -- that is the distribution for each degree.
Now lets plot this in R. First we load the distribution into memory (which is much smaller than the graph.