Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Calculating In-Degree using R MapReduce over Hadoop
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  7 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post will appear after it is approved by moderators
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Marko Rodriguez  
View profile  
 More options Feb 2, 3:18 am
From: Marko Rodriguez <okramma...@gmail.com>
Date: Thu, 2 Feb 2012 01:18:39 -0700
Local: Thurs, Feb 2 2012 3:18 am
Subject: Calculating In-Degree using R MapReduce over Hadoop

Hi,

I'm playing with RHadoop which is an R package that lets you do MapReduce jobs over Hadoop.
        https://github.com/RevolutionAnalytics/RHadoop

I have a pseudo Hadoop cluster running on my local machine. I was playing around with it and all of a sudden thought: "Ah! Its easy to calculate vertex in-degree!" Want to see how it is done?

Lets create a random graph with 10 vertices and 100 edges and represent it as as a two-column table (i.e. an edge list).

> graph <- matrix(sample.int(10,100,replace=TRUE),50)
> graph[1:5,]

     [,1] [,2]
[1,]    6    2
[2,]    2    2
[3,]    4   10
[4,]    9   10
[5,]    5    7

The first row says that vertex 6 is connected to vertex 2.

Now lets dump that data into HDFS:

> x <- to.dfs(graph)

This creates the variable x which is a reference to the file in HDFS. Lets look at one key/value pair in that file.

> from.dfs(x)[1]

[[1]]
[[1]]$key
[1] 6
[[1]]$val
[1] 6 2
attr(,"rmr.keyval")
[1] TRUE

We can ignore the key -- what matters is the value is the edge data that says that vertex 6 is connected to vertex 2.

Now lets calculate the in-degree of all vertices in the graph.

> y <- mapreduce(x, map=function(k,v) keyval(v[2],1), reduce=function(k,v) keyval(k, length(v)))

12/02/02 01:02:51 INFO mapred.FileInputFormat: Total input paths to process : 1
12/02/02 01:02:51 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-marko/mapred/local]
12/02/02 01:02:51 INFO streaming.StreamJob: Running job: job_201202012324_0008
12/02/02 01:02:51 INFO streaming.StreamJob: To kill this job, run:
12/02/02 01:02:51 INFO streaming.StreamJob: /Applications/hadoop/hadoop-0.20.203.0/bin/../bin/hadoop job  -Dmapred.job.tracker=localhost:9001 -kill job_201202012324_0008
12/02/02 01:02:51 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201202012324_0008
12/02/02 01:02:52 INFO streaming.StreamJob:  map 0%  reduce 0%
12/02/02 01:03:06 INFO streaming.StreamJob:  map 100%  reduce 0%
12/02/02 01:03:18 INFO streaming.StreamJob:  map 100%  reduce 100%
12/02/02 01:03:24 INFO streaming.StreamJob: Job complete: job_201202012324_0008
12/02/02 01:03:24 INFO streaming.StreamJob: Output: /var/folders/W7/W7Tviv8iFv8r508qXHlEs++++TM/-Tmp-//Rtmpan4VFX/file3e1ae6f

The map job says, emit key/value pairs where the key is the head vertex of the edge (e.g. 6) and the value is just the number 1.
The reduce job says, emit key/value pairs where the key is the key (e.g. 6) and the value is the number of 1s in the value (i.e. the in-degree)

Now we can check our results which was written to the file reference in HDFS reference by y.

> from.dfs(y)[3]

[[1]]
[[1]]$key
[1] 6
[[1]]$val
[1] 2

The result says that 6 has an in-degree of 2. Lets check using standard R constructs.

> table(graph[,2])

 1  2  3  4  5  6  7  8  9 10
 3  6  5  4  6  2  9  2  4  9

Yep. 6 is seen 2 times in the second column.

Tada! In-Degree using R and Hadoop. Moreover, its just a one-liner in R:

         mapreduce(x, map=function(k,v) keyval(v[2],1), reduce=function(k,v) keyval(k, length(v)))

Can you tell me how to calculate a graph's degree distribution? --- HINT: its this MapReduce job composed with another.

Hope that was enlightening,
Marko.

http://markorodriguez.com


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Antonio Piccolboni  
View profile  
 More options Feb 3, 2:40 am
From: Antonio Piccolboni <picco...@gmail.com>
Date: Thu, 2 Feb 2012 23:40:52 -0800 (PST)
Local: Fri, Feb 3 2012 2:40 am
Subject: Re: Calculating In-Degree using R MapReduce over Hadoop
Hi, I am the developer of the rmr package used here and besides being pleased with seeing it put to good use, since there is an interest in doing graph algorithms with it, let me point to you an implementation of connected components on my blog

http://blog.piccolboni.info/2011/09/connected-components-example-rewr...

Please do not hesitate with comments or questions.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "Calculating a Graph's Degree Distribution using R MapReduce over Hadoop" by Marko Rodriguez
Marko Rodriguez  
View profile  
 More options Feb 3, 3:02 am
From: Marko Rodriguez <okramma...@gmail.com>
Date: Fri, 3 Feb 2012 01:02:36 -0700
Local: Fri, Feb 3 2012 3:02 am
Subject: Calculating a Graph's Degree Distribution using R MapReduce over Hadoop

Howdy ho,

So no one answered my question from the email entitled "Calculating In-Degree using R MapReduce over Hadoop."

        "Can you tell me how to calculate a graph's degree distribution? --- HINT: its this MapReduce job composed with another."

As we saw before, this is how you calculate the in-degree for every vertex in the graph. Lets do it again, but this time, over a larger random graph -- 100 vertices and 10,000 edges.

> graph <- matrix(sample.int(100,10000,replace=TRUE),ncol=2)
> graph[1:5,]

     [,1] [,2]
[1,]   97    4
[2,]   87   63
[3,]    9   79
[4,]   92   62
[5,]   22   77

> x <- to.dfs(graph)
> y <- mapreduce(x, map=function(k,v) keyval(v[2],1), reduce=function(k,v) keyval(k, length(v)))

The variable y is a reference to an HDFS file that has the in-degree of every vertex. We can then use this result to produce a degree distribution. First, what is a degree distribution? It is the histogram/density of the various degrees -- i.e. how many nodes have a degree of 1? of 2? of 3? of 4? etc.... This is the MapReduce job that does the trick.

> z <- mapreduce(y, map=function(k,v) keyval(v,1), reduce=function(k,v) keyval(k, length(v)))

        MAP: Take all the vertex degrees and emit each degree with the the value 1.
        REDUCE: Take all the degrees and their list of 1s and count their list -- that is the distribution for each degree.

Now lets plot this in R. First we load the distribution into memory (which is much smaller than the graph.

  Rplot.pdf
140K Download

Here is the algorithm in composite starting from an edge list and yielding an in-degree distribution:

mapreduce(
        mapreduce(to.dfs(graph),
                map=function(k,v) keyval(v[2],1), reduce=function(k,v) keyval(k, length(v))) ,
        map=function(k,v) keyval(v,1), reduce=function(k,v) keyval(k, length(v)))))

Boo yea!,
Marko.

http://markorodriguez.com


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Neubauer  
View profile  
 More options Feb 3, 4:15 am
From: Peter Neubauer <neubauer.pe...@gmail.com>
Date: Fri, 3 Feb 2012 10:15:52 +0100
Local: Fri, Feb 3 2012 4:15 am
Subject: Re: [TinkerPop] Calculating a Graph's Degree Distribution using R MapReduce over Hadoop

Marko,
Love this. It demands a screencast to replicate!

Send from a device with crappy keyboard and autocorrection.

/peter
On Feb 3, 2012 9:02 AM, "Marko Rodriguez" <okramma...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jack  
View profile  
 More options Feb 3, 5:03 am
From: Jack <winesht...@gmail.com>
Date: Fri, 3 Feb 2012 02:03:01 -0800 (PST)
Local: Fri, Feb 3 2012 5:03 am
Subject: Re: Calculating a Graph's Degree Distribution using R MapReduce over Hadoop
Very interesting, but isn't there a way to accomplish this with pure
Gremlin?
I am struggling for awhile now with a similar problem of how to
calculate node's statistical deviation of edges. I know how to do it
using groovy but having it in pure Gremlin would be nicer, would be
happy to know if its impossible :)

(the statistical deviation function requires a [N:3] matrix containing
for each row [EdgeType, Degree group, Amount of nodes that contain
this degree level for the specific edge type])

On 3 פברואר, 10:02, Marko Rodriguez <okramma...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Pierre De Wilde  
View profile  
 More options Feb 3, 5:34 am
From: Pierre De Wilde <pierredewi...@gmail.com>
Date: Fri, 3 Feb 2012 11:34:56 +0100
Local: Fri, Feb 3 2012 5:34 am
Subject: Re: [TinkerPop] Re: Calculating a Graph's Degree Distribution using R MapReduce over Hadoop

Hi,

Very interesting, but isn't there a way to accomplish this with

> pure Gremlin?

Here is the Graph's Degree Distribution in Gremlin:

m = [:].withDefault{0};
g.V.sideEffect{m[it.both.count()]+=1}.iterate();
m.sort{a,b->a.key<=>b.key};

To get In-degree or out-degree distributions, replace it.both.count() by
it.in.count() or it.out.count()

HTH,
Pierre

I am struggling for awhile now with a similar problem of how to


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jack  
View profile  
 More options Feb 3, 10:44 am
From: Jack <winesht...@gmail.com>
Date: Fri, 3 Feb 2012 07:44:38 -0800 (PST)
Local: Fri, Feb 3 2012 10:44 am
Subject: Re: Calculating a Graph's Degree Distribution using R MapReduce over Hadoop
Wow! thanks. finally I also understand what the sideEffect is good
for.

On Feb 3, 12:34 pm, Pierre De Wilde <pierredewi...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »